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Description 

RackarounH of the Invention 

r0001l The estimated 50,000-100,000 genes scat- 
tered along the human chromosomes offer tremendous 
promise tor the understanding, diagnosis, and treatment 
of human diseases. In addition, probes capable of spe- 
cifically hybridizing to loci distributed throughout the hu- 
man genome find applications in the construction of high 
resolution chromosome maps and in the .dentrf.cat.on 
of individuals. 

[0002] In the past, the characterization of even a sin- 
gle human gene was a painstaking process, requiring 
years of effort. Recent developments in the areas of 
cloning vectors, DNA sequencing, and computer tech- 
noTogy have merged to greatly accelerate the rate at 
which human genes can be isolated, sequenced 
mapped, and characterized. Cloning vectors such as 
yeast artificial chromosomes (YACs) and bacteria art.- 
ficial chromosomes (BACs) are able to accept DNA t p 
serts ranging from 300 to 1000 kilobases <kb) or 
1 00-400 kb in length respectively, thereby facilitating the 
manipulation and ordering of DNA sequences distribut- 
ed over great distances on the human chromosomes^ 
Automated DNA sequencing machines permit the rapid 
sequencing of human genes. Bioinformatics software 
enables the comparison of nucleic acid and protein se- 
quences, thereby assisting in the characterizat.on of hu- 
man gene products. . 
[0003] Currently, two different approaches are being 
pursued for identifying and characterizing the genes dis- 
tributed along the human genome. In one approach, 
large fragments of genomic DNA are isolated, cloned, 
and sequenced. Potential open reading frames in these 
qenomic sequences are identified using bioinformatics 
software. However, this approach entails sequencing 
large stretches of human DNA which do not encode pro- 
teins in order to find the protein encoding sequences 
scattered throughout the genome. In addition to requir- 
ing extensive sequencing, the bioinformatics software 
may mischaracterize the genomic sequences obtained^ 
Thus, the software may produce false positives in Which 
non^coding DNA is mischaracterized as coding DNA or 
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sequence of the EST which was used to obtain them or 
only a portion of the sequence of the EST which was 
used to obtain them. In addition, the extended cDNAs 
may contain the full coding sequence of the gene from 
5 which the EST was derived or. alternatively, the extend- 
ed cDNAs may include portions of the coding sequence 
of the gene from which the EST was derived, n will be 
appreciated that there may be several extended cDNAs 
which include the EST sequence as a result of alternate 
10 splicing or the activity of alternative promoters. Alterna- 
tively, ESTs having partially overlapping sequences may 
be identified and contigs comprising the consensus se- 
quences of the overlapping ESTs may be identified. 
[000S] In the past, these short EST sequences were 
is often obtained from oligo-dT primed cDNA libraries. Ac- 
cordingly, they mainly corresponded to the 3' untrans- 
lated region of the mRNA. In part, the prevalence of EST 
sequences derived from the 3' end of the mRNA is a 
result of the fact that typical techniques for obtaining cD- 
20 NAs are not well suited for isolating cDNA sequences 
derived from the 5' ends of mRNAs. (Adams et al.. Ma- 
ture 377:3-174, 1996, Hillier et al., Genome Res. 6. 
B07-928, 1996). 

[0006] In addition, in those reported instances where 
25 longer cDNA sequences have been obtained, the re- 
ported sequences typically correspond to coding se- 
quences and do not include the full 5' untranslated re- 
gion (5'UTR) of the mRNA from which the cDNA is de- 
rived 5'UTRs are often involved in the regulation of 
30 gene expression, by affecting either the stability or 
translation of mRNAs. Indeed, 5'UTRs may contain sev- 
eral features known to affect the initiation of translation: 
(i) the distance between the cap structure and the initi- 
ation codon, (ii) the presence of cis-acting elements 
which may be either linear sequences such as polypy- 
rimidine tracts (Kaspar at al, J. Biol. Chem. 267. 
503-514. 1992; Severson ef al., Eur J Biochem 229. 
426-32, 1995) or secondary structures such as IREs 
(Rouault and Klausner. Curr Top Cell Regul 35:1-19, 
1 997) and (iii) upstream open reading frames or uORFs 
(Geballe and Morris, Trends Biochem Sci 19:159-64, 
1994) Thus, regulation of gene expression may be 
achieved through the use of alternative 5'UTRs. For in- 
stance, the translation of the tissue inhibitor of metallo- 



, mischaract ^""SS^Z « P~ mRNA is enhanced in mitogenical.y actuated 
which coding DNA ts mislabeled as P mftriification ot the start codon of an uORF 



false negatives in 
non-coding DNA. 
r00O41 An alternative approach takes a more direct 
route to identifying and characterizing human genes. In 
this approach, complementary DNAs (cDNAs) are syn- 
thesized from isolated messenger RNAs (mRNAs) 
which encode human proteins. Using this approach, se- 
quencing is only performed on DNA which is derived 
from protein coding portions of the genome. Often, on y 
short stretches of the cDNAs are sequenced to obtain 
sequences called expressed sequence tags (ESTsy 
The ESTs may then be used to isolate or purify extended 
cDNAs which include sequences adjacent to the EST 
sequences. The extended cDNAs may contain all of the 



cells through modification of the start codon of an uORF 
in its 5'UTR using an alternative promoter (Waterhouse 
ef a/ J Biol Chem. 265:5585-9. 1990). Furthermore, 
modification of 5'UTR through mutation, insertion or 
translocation events may even be implied in pathogen- 
esis For instance, the fragile X syndrome, the most 
common cause of inherited mental retardation, is partly 
due to an insertion of multiple CGG trinucleotides in the 
5'UTR of the fragile X mRNA resulting in the inhibition 
of protein synthesis via ribosome stalling (Feng et al. 
Science 268:731 -4. 1995). An aberrant mutation in re- 
gions ol the 5'UTR known to inhibit translation of the pro- 
to-oncogene c-myc was shown to result in upregulation 
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of C-myc protein levels in cells derived from patients 
with multiple myelomas (Willis et al, Curr Top Microbiol 
Immunol 224:269-76, 1997). However, the use of oligo- 
dT primed cDNA libraries does not allow the isolation of 
complete 5'UTRs since such obtained incomplete se- 
quences may not include the first exon of the mRNA, 
particularly in situations where the first exon is short. 
Furthermore, they may not include some exons, often 
short ones, which are located upstream of splicing sites. 
Thus, there is a need to obtain sequences derived from 
the 5' ends of mRNAs. 

[0007] While many sequences derived from human 
chromosomes have practical applications, approaches 
based on the identification and characterization of those 
chromosomal sequences which encode a protein prod- 
uct are particularly relevant to diagnostic and therapeu- 
tic uses. In some instances, the sequences used in such 
therapeutic or diagnostic techniques may be sequences 
which encode proteins which are secreted from the cell 
in which they are synthesized, as well as the secreted 
proteins themselves, are particularly valuable as poten- 
tial therapeutic agents. Such proteins are often involved 
in cell to cell communication and may be responsible for 
producing a clinically relevant response in their target 
cells. In fact, several secretory proteins, including tissue 
plasminogen activator. G-CSF, GM-CSF, erythropoietin, 
human growth hormone, insulin, interferon-a, interfer- 
on-p\ interieron-y, and interleuktn-2, are currently in clin- 
ical use. These proteins are used to treat a wide range 
of conditions, including acute myocardial infarction, 
acute ischemic stroke, anemia, diabetes, growth hor- 
mone deficiency, hepatitis, kidney carcinoma, chemo- 
therapy-induced neutropenia and multiple sclerosis. For 
these reasons, extended cDNAs encoding secreted 
proteins or portions thereof represent a valuable source 
of therapeutic agents. Thus, there is a need for the iden- 
tification and characterization of secreted proteins and 
the nucleic acids encoding them. 
[0008] In addition to being therapeutically useful 
themselves, secretory proteins include short peptides, 
called signal peptides, at their amino termini which direct 
their secretion. These signal peptides are encoded by 
the signal sequences located at the 5' ends of the coding 
sequences of genes encoding secreted proteins. These 
signal peptides can be used to direct the extracellular 
secretion of any protein to which they are operably 
linked. In addition, portions of the signal peptides called 
membrane-translocating sequences, may also be used 
to direct the intracellular import of a peptide or protein 
of interest. This may prove beneficial in gene therapy 
strategies in which it is desired to deliver a particular 
gene product to cells other than the cell in which it is 
produced. Signal sequences encoding signal peptides 
also find application in simplifying protein purification 
techniques. In such applications, the extracellular se- 
cretion of the desired protein greatly facilitates purifica- 
tion by reducing the number of undesired proteins from 
which the desired protein must be selected. Thus, there 



exists a need to identify and characterize the 5' portions 
of the genes for secretory proteins which encode signal 
peptides. 

[0009] Sequences coding for non-secreted proteins 

s may also find application as therapeutics or diagnostics. 
In particular, such sequences may be used to determine 
whether an individual is likely to express a detectable 
phenotype, such as a disease, as a consequence of a 
mutation in the coding sequence for a non-secreted pro- 

io tein or for a secreted protein. In instances where the in- 
dividual is at risk of suffering from a disease or other 
undesirable phenotype as a result of a mutation in such 
a coding sequence, the undesirable phenotype may be 
corrected by introducing a normal coding sequence us- 

fs ing gene therapy. Alternatively, if the undesirable phe- 
notype results from overexpression of the protein en- 
coded by the coding sequence, expression of the pro- 
tein may be reduced using antisense or triple helix 
based strategies. 

20 [001 0] The secreted or non-secreted human polypep- 
tides encoded by the coding sequences may also be 
used as therapeutics by administering them directly to 
an individual having a condition, such as a disease, re- 
sulting from a mutation in the sequence encoding the 

2S polypeptide. In such an instance, the condition can be 
cured or ameliorated by administering the polypeptide 
to the individual. 

[0011] In addition, the secreted or non-secreted hu- 
man polypeptides or portions thereof may be used to 
30 generate antibodies useful in determining the tissue 
type or species of origin of a biological sample. The an- 
tibodies may also be used to determine the cellular lo- 
calization of the secreted or non-secreted human 
polypeptides or the cellular localization of polypeptides 
35 which have been fused to the human polypeptides. In 
addition, the antibodies may also be used in immunoaf- 
finity chromatography techniques to isolate, purify, or 
enrich the human polypeptide or a target polypeptide 
which has been fused to the human polypeptide. 
40 [0012] Public information on the number of human 
genes for which the promoters and upstream regulatory 
regions have been identified and characterized is quite 
limited. In part, this may be due to the difficulty of isolat- 
ing such regulatory sequences. Upstream regulatory 
45 sequences such as transcription factor binding sites are 
typically too short to be utilized as probes for isolating 
promoters from human genomic libraries. Recently, 
some approaches have been developed to isolate hu- 
man promoters. One of them consists of making a CpG 
so island library (Cross ef al., , Nature GeneticsB: 236-244, 
1 994). The second consists of isolating human genomic 
DNA sequences containing Spel binding sites by the 
use of Spel binding protein. (Mortlock et al., Genome 
Res. 6:327-335, 1996). Both of these approaches have 
55 their limits due to a lack of specificity or because they 
are not universally applicable since only a limited 
number of promoters have either a CpG island or a Spe 
I recognition site and because Spe I binding sites are 
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"nucleotide" is also used herein to encompass "modified 
nucleotides" which comprise at least one modifications 
(a) an alternative linking group, (b) an analogous form 
of purine, (c) an analogous torm oi pyrimidine, or (d) an 
analogous sugar, for examples o! analogous linking 
groups, purine, pyridines, and sugars see for example 
PCT publication No. WO 95/04064. The polynucleotide 
sequences of the invention may be prepared by any 
known method, including synthetic, recombinant, ex vi- 
vo generation, or a combination thereof, as well as uti- 
lizing any purification methods known in the art. 
[0022] The terms "base paired" and "Watson & Crick 
base paired" are used interchangeably herein to refer 
to nucleotides which can be hydrogen bonded to one 
another be virtue of their sequence identities in a man- 
ner like that found in double-helical DNA with thymine 
or uracil residues linked to adenine residues by two hy- 
drogen bonds and cytosine and guanine residues l.nked 
by three hydrogen bonds (See Stryer, L. Biochemistry, 
4 th edition. 1995). 

[00231 The terms "complementary' or "complement 
thereof are used herein to refer to the sequences of 
polynucleotides which is capable of forming Watson & 
Crick base pairing with another specified polynucleotide 
throughout the entirety of the complementary region. 
For the purpose of the present invention, a first polynu- 
cleotide is deemed to be complementary to a second 
polynucleotide when each base in the first polynucle- 
otide is paired with its complementary base. Comple- 
mentary bases are, generally, A and T (or A and U). or 
C and G "Complement" is used herein as a synonym 
from "complementary polynucleotide", "complementary 
nucleic acid" and "complementary nucleotide se- 
quence' These terms are applied to pairs of polynucle- 
otides based solely upon their sequences and not any 
particular set of conditions under which the two polynu- 
cleotides would actually bind. Preferably, a "comple- 
mentary' sequence is a sequence which an A at each 
position where there is a T on the opposite strand, a T 
at each position where there is an A on the opposite 
strand a G at each position where there is a C on the 
opposite strand and a C at each position where there is 
a G on the opposite strand. 

[0024] Thus, 5' ESTs in cDNA libraries in which one 
or more 5' ESTs make up 5% or more of the number of 
nucleic acid inserts in the backbone molecules are 'en- 
riched recombinant 5 1 ESTs" as defined herein. Like- 
wise 5" ESTs in a population of plasmids in which one 
or more 5' ESTs of the present invention have been in- 
serted such that they represent 5% or more of the 
number of inserts in the plasmid backbone are "enriched 
recombinant 5' ESTs" as defined herein. However. 5 
ESTs in cDNA libraries in which 5' ESTs constitute less 
than 5% of the number of nucleic acid inserts in the pop- 
ulation of backbone molecules, such as libranes in 
which backbone molecules having a 5' EST insert are 
extremely rare, are not 'enriched recombinant 5' ESTs. 
[0025] in some embodiments, the present invention 



relates to 5' ESTs which are derived from genes encod- 
ing secreted proteins. As used herein, a "secreted" pro- 
tein is one which, when expressed in a suitable host cell, 
is transported across or through a membrane, including 
5 transport as a result of signal peptides in its amino acid 
sequence. 'Secreted' proteins include without limitation 
proteins secreted wholly (e.g. soluble proteinsT, or par- 
tially (e.g. receptors) from the cell in which they are ex- 
pressed. "Secreted" proteins also include without limi- 
w tation proteins which are transported across the mem- 
brane of the endoplasmic reticulum. 
[0026] Such 5' ESTs include nucleic acid sequences, 
called signal sequences, which encode signal peptides 
which direct the extracellular secretion of the proteins 
is encoded by the genes from which the 5' ESTs are de- 
rived. Generally, the signal peptides are located at the 
amino termini of secreted proteins. 
[0027] Secreted proteins are translated by ribosomes 
associated with the 'rough' endoplasmic reticulum. 
20 Generally, secreted proteins are co-translationally 
transferred to the membrane of the endoplasmic reticu- 
lum. Association of the ribosome with the endoplasmic 
reticulum during translation of secreted proteins is me- 
diated by the signal peptide. The signal peptide is typi- 
25 cally cleaved following its co-translational entry into the 
endoplasmic reticulum. After delivery to the endoplas- 
mic reticulum, secreted proteins may proceed through 
the Golgi apparatus. In the Golgi apparatus, the proteins 
may undergo post-translational modification before en- 
30 tering secretory vesicles which transport them across 
the cell membrane. 

[0028] The 5' ESTs of the present invention have sev- 
eral important applications. For example, they may be 
used to obtain and express cDNA clones which include 
35 the full protein coding sequences of the corresponding 
gene products, including the authentic translation start 
sites derived from the 5' ends of the coding sequences 
of the mRN As from which the 5' ESTs are derived. These 
cDNAs will be relerred to hereinafter as "full-length cD- 
40 NAs.* These cDNAs may also include DNA derived from 
mRNA sequences upstream oi the translation start site. 
The full-length cDNA sequences may be used to ex- 
press the proteins corresponding to the 5' ESTs. As dis- 
cussed above, secreted proteins and non-secreted pro- 
45 teins may be therapeutically important. Thus, the pro- 
teins expressed from the cDNAs may be useful in treat- 
ing or controlling a variety of human conditions. The 5' 
ESTs may also be used to obtain the corresponding ge- 
nomic DNA. The term "corresponding genomic DNA" re- 
50 fers to the genomic DNA which encodes the mRNA from 
which the 5* EST was derived. 
[0029] Alternatively, the 5' ESTs may be used to ob- 
tain and express extended cDNAs encoding portions ol 
the protein. In the case of secreted proteins, the portions 
55 may comprise the signal peptides of the secreted pro- 
teins or the mature proteins generated when the signal 
peptide is cleaved off. 

[0030] The present invention includes isolated, puri- 
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tied, or enriched "EST-related nucleic acids.' The terms 
■iso ated\ "purified" or "enriched" have the meanings 
^ed above. As used herein, the term "EST rela£ 
nucleic acids" means the nucleic acids of SEQ ID NO* 
24 4100 and 8178-36681 , extended cDNAs obtainable 
usin 1 nuclei acids of SEQ ID NOs: 24-41 0 and 
8178-36681 , full-length cDNAs obtainable using the nu- 
c^acWso;SEQlDNOs:244l00and8l78-36War 

aenomic DMAs obtainable using the nucle.c acds of 
SEQ ID NOs: 24-4100 and 8178-36681 . The present m- 
vention also includes the sequences complementary to 
the EST-related nucleic acids. 

r00311 The present invention also includes isolated. 

Si. - - fra9ments of 

acids." The terms "isolated", "punf.ed" and enriched 

have the meanings described above. As used hereinthe 
term "fragments of EST-related nuCe.c acids means 
fraqments comprising at least 10, 12, 15. 18, 20 23. 25, 
28 35. 40, 50, 75, 100. 200, 300, 500, or 1000 cor, 
secutive nucleotides of the EST-related nuc.e,c ac.ds to 
the extent that fragments of these lengths are consistent 
w'hthelengthsoftheparticularEST-relatednucleicac- 

being referred to. The present invention also in- 
cludes the sequences complementary to the fragments 
of the EST-related nucleic acids. 
[0032] The present invention also includes isolated, 
purified, or enriched -positional segments of Relat- 
ed nucleic acids." The terms T^'^ 1 *^ 
•enriched" have the meanings provided above. As used 

hTr£.theterm^^ 

oleic acids" includes segments compr.s.ng n ucleot des 
1 9* 26-50 51-75, 76-100, 101-125, 126-15U, 
5M 75 176-200. 201-225, 226-250, 251-300 
» -325* 326-350. 351-375, 376-400. 401-425, 
ZTo. 451-475, 476-500, 501-525, 526-550 
551-575 576-600 and 601 -the terminal nucleotide of 
theEST-Velatednucleicacidstotheextentthatsuchnu 
cleotide positions are consistent with the lengths of the 
particular EST-related nucleic acids being referred to. 
The te- "positional segments of EST-related nucleic 

acids also includes ^^^^^^ 
1-50 51-100, 101-150, 151-200, 201-250, 251-300, 
301 350, 351-400, 401-450, 450-500. 501-550, 
551-600 or 601 -the terminal nucleotide ol the EST-re- 
lated nucleic acids to the extent that such nucleotide po- 
S Z are consistent with the lengths of the particufcr 
EST-related nuclei acids being referred to. The term 
■posriional segments of EST-refcted nucleic ^acds al*> 
includes segments comprising nucleotides M00. 
0 -200. 201-300. 301-400. 501-500, 500*00, or 
601 -the terminal nucleotide of the EST-related nucleic 
acids to the extent that such nucleotide positions are 

nucleic acidsbeing referred to. lnadd,t,on. the erm ,po- 
sitionalsegmentsof EST-rela.ed nucleic ac,ds in ludes 

segments comprising ^^'^'^^ 
400-600 or 601 -the terminal nucleotide of the EST-re- 
Sed nucleic acids to the extent that such nucleotide po- 



sitions are consistent with the lengths of the particular 
EST related nucleic acids being referred to. The present 
invention also includes the sequences complementary 
to the positional segments of EST-related nucleic ac.ds. 
5 [0033] The present invention also includes isolated, 
purified, or enriched "fragments of positional segments 
of EST-related nucleic acids." The terms "isolated", "pu- 
rified" or 'enriched" have the meanings provided above. 
As used herein, the term "fragments of positional seg- 
10 ments of EST-related nucleic acids" refers to fragments 
comprising at least 10, 15. 18. 20, 23. 25. 28, 30. 35 
40 50 75 100, 150, or 200 consecutive nucleotides ol 
the positional segments of EST-related nucleic acids. 
The present invention also includes the sequences 
»5 complementary to the fragments of positional segments 
of EST-related nucleic acids 

[0034] The present invention also includes isolated or 
purified 'EST-related polypeptides." The terms "isolat- 
ed" or 'purified" have the meanings provided above. As 
20 used herein, the term 'EST-related polypeptides" 
means the polypeptides encoded by the EST-related 
nucleic acids, including the polypeptides of SEQ ID 
NOs: 4101-8177. 

[0035] The present invention also includes isolated or 
25 purified "fragments of EST-related polypeptides.- The 
terms "isolated" or -purified" have the meanings provid- 
ed above. As used herein, the term "fragments of EST- 
related polypeptides" means fragments comprising at 
least 5 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 
30 consecutive amino acids of an EST-related polypeptide 
to the extent that fragments of these lengths are con- 
sistent with the lengths of the particular EST-related 
polypeptides being referred to. 
r0036] The present invention also includes isolated or 
35 purified 'positional segments of EST-related polypep- 
tides • As used herein, the term "positional segments of 
EST-related polypeptides" includes polypeptides com- 
prising amino acid residues 1-25, 26-50, 51-75, 76-100, 
101-125 126-150, 151-175, 176-200. or 201 -the C-ter- 
40 minal amino acid of the EST-related polypeptides to the 
extent that such amino acid residues are consistent with 
the lengths of the particular EST-related polypeptides 
being referred to. The term "positional segments of EST- 
related polypeptides also includes segments compris- 
es ingaminoacid residues 1-50.51-100, 101-150, 151-200 
or 201 -the C-terminal amino acid of the EST-related 
polypeptides to the extent that such aminoacid residues 
are consistent with the lengths of the particular EST-re- 
lated polypeptides being referred to. The term "posrtion- 
so al segments of EST-related polypeptides" also mcludes 
segments comprising amino acids 1-100 or 101-200 of 
the EST-related polypeptides to the extent that such 
amino acid residues are consistent with the lengths of 
particular EST-related polypeptides being referred to. In 
55 addition, the term -positional segments of EST-related 
polypeptides" includes segments compnsmg ammo ac- 
id residues 1-200 or 201 -the C-terminal amino acid of 
the EST-related polypeptides to the extent that ammo 
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acid residues are consistent with the lengths of the par- 
ticular EST related polypeptides being referred to. 
[0037] The present invention also includes isolated or 
purified 'fragments of positional segments of EST-relat- 
ed polypeptides." The terms "isolated" or "purified" have 
the meanings provided above. As used herein, the term 
■fragments of positional segments of EST-related 
polypeptides" means fragments comprising at least 5, 
10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecu- 
tive amino acids of positional segments of EST-related 
polypeptides to the extent that fragments of these 
lengths are consistent with the lengths of the particular 
EST-related polypeptides being referred to. 
[0038] The present invention also includes antibodies 
which specifically recognize the EST-related polypep- 
tides, fragments of EST-related polypeptides, positional 
segments of EST-related polypeptides, or fragments of 
positional segments of EST-related polypeptides. In the 
case of secreted proteins, such as those of SEQ ID NOs: 
7798-7888 antibodies which specifically recognize the 
mature protein generated when the signal peptide is 
cleaved may also be obtained as described below. Sim- 
ilarly, antibodies which specifically recognize the signal 
peptides of SEQ ID NOs: 41 01 -4729 or 7798-7888 may 
also be obtained. 

[0039] In some embodiments and in the case of se- 
creted proteins, the EST-related nucleic acids, frag- 
ments of EST-related nucleic acids, positional segments 
of EST-related nucleic acids, or fragments of positional 
segments of nucleic acids include a signal sequence. In 
other embodiments, the EST-related nucleic acids, f rag- 
mentsof EST-related nucleic acids, positional segments 
of EST-related nucleic acids, or fragments of positional 
segments of nucleic acids may include the full coding 
sequence for the protein or, in the case of secreted pro- 
teins, the full coding sequence of the mature protein (i. 
e. the protein generated when the signal polypeptide is 
cleaved off). In addition, the EST-related nucleic acids, 
fragments of EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids, or fragments of po- 
sitional segments of nucleic acids may include regula- 
tory regions upstream of the translation start site or 
downstream of the stop codon which control the 
amount, location, or developmental stage of gene ex- 
pression. 

[0040] As discussed above, both secreted and non- 
secreted human proteins may be therapeutically impor- 
tant. Thus, the proteins expressed from the EST-related 
nucleic acids, fragments of EST-related nucleic acids, 
positional segments of EST-related nucleic acids, or 
fragments of positional segments of nucleic acids may 
be useful in treating or controlling a variety of human 
conditions. 

[0041] The EST-related nucleic acids, fragments of 
EST-related nucleic acids, positional segments of EST- 
related nucleic acids, or fragments of positional seg- 
ments of nucleic acids may be used in forensic proce- 
dures to identify individuals or in diagnostic procedures 



to identify individuals having genetic diseases resulting 
from abnormal gene expression. In addition, the EST- 
related nucleic acids, fragments of EST-related nucleic 
acids, positional segments of EST-related nucleic acids, 
s or fragments of positional segments of nucleic acids are 
useful for constructing a high resolution map of the hu- 
man chromosomes. 

[0042] The present invention also relates to secretion 
vectors capable of directing the secretion of a protein of 

10 interest. Such vectors may be used in gene therapy 
strategies in which it is desired to produce a gene prod- 
uct in one cell which is to be delivered to another location 
in the body. Secretion vectors may also facilitate the pu- 
rification of desired proteins. 

'5 [0043] The present invention also relates to expres- 
sion vectors capable of directing the expression of an 
inserted gene in a desired spatial or temporal manner 
or at a desired level. Such vectors may include sequenc- 
es upstream of the EST-related nucleic acids, fragments 

20 of EST-related nucleic acids, positional segments of 
EST-related nucleic acids, or fragments of positional 
segments of nucleic acids, such as promoters or up- 
stream regulatory sequences. 

[0044] The present invention also comprises fusion 

2S vectors for making chimeric polypeptides comprising a 
first polypeptide and a second polypeptide. Such vec- 
tors are useful for determining the cellular localization 
of the chimeric polypeptides or for isolating, purifying or 
enriching the chimeric polypeptides. 

30 [0045] The EST-related nucleic acids, fragments of 
EST-related nucleic acids, positional segments of EST- 
related nucleic acids, or fragments of positional seg- 
ments of nucleic acids may also be used for gene ther- 
apy to control or treat genetic diseases. In the case of 

35 secreted proteins, signal peptides may be fused to het- 
erologous proteins todirect their extracellular secretion. 
[0046] Bacterial clones containing Bluescipt plasm ids 
having inserts containing the sequence of the non-clus- 
tered 5'ESTs are presently stored at 80°C in 4% (vA/) 

40 glycerol in the inventor's laboratories under the desig- 
nations. The non-clustered 5'ESTs are those which 
comprise a single EST from a single tissue in the listing 
of Table II. The inserts may be recovered from the stored 
materials by growing the appropriate clones on a suita- 

<5 ble medium. The Bluescript DNA can then be isolated 
using plasmid isolation procedures familiar to those 
skilled in the art such as alkaline lysis minipreps or large 
scale alkaline lysis plasmid isolation procedures. If de- 
sired the plasmid DNA may be further enriched by cen- 

50 trifugation on a cesium chloride gradient, size exclusion 
chromatography, or anion exchange chromatography. 
The plasmid DNA obtained using these procedures may 
then be manipulated using standard cloning techniques 
familiar to those skilled in the art. Alternatively, a PCR 

55 can be done with primers designed at both ends of the 
inserted EST-related nucleic acids, fragments of EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids, or fragments of positional segments of 
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nucleic acids. The PCR product which corresponds to 
the EST-related nucleic acids, fragments of EST-related 
nucleic acids, positional segments of EST-related nu- 
cleic acids, or fragments of positional segments ol nu- 
cleic acids can then be manipulated using standard 
cloning techniques familiar to those skilled in the art. 
[0047] One embodiment of the present invention is a 
purified nucleic acid comprising a sequence selected 
from the group consisting of SEQ ID NOs: 24-4100 and 
SEQID NOs- 8178-36681 and sequences complemen- 
tary to the sequencesof SEQ ID NOs: 24-4100 and SEQ 
ID NOs: 8178-36681. 

[0048] Another embodiment of the present invention 
is a purified nucleic acid comprising at least 10 consec- 
utive nucleotides of a sequence selected from the group 
consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and sequences complementary to the se- 
quences of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681. 

[0049] Another embodiment ol the present invention 
is a purified nucleic acid comprising at least 1 5 consec- 
utive nucleotides of a sequence selected from the group 
consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and sequences complementary to the se- 
quences of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681. 

[0050] A further embodiment of the present invention 
is a purified nucleic acid comprising the coding se- 
quence of a sequence selected from the group consist- 
ing of 24-4100. 

[0051] Yet another embodiment of the present inven- 
tion is a purified nucleic acid comprising the full coding 
sequences of a sequence selected from the group con- 
sisting of SEQ ID NOs: 3721-3811 wherein the full cod- 
ing sequence comprises the sequence encoding the 
signal peptide and the sequence encoding the mature 
protein. 

Still another embodiment of the present invention is a 
purified nucleic acid comprising a contiguous span of a 
sequence selected from the group consisting of SEQ ID 
NOs: 3721-3811 which encodes the mature protein. 
[0052] Another embodiment of the present invention 
is a purified nucleic acid comprising a contiguous span 
ol a sequence selected from the group consisting of 
SEQID NOs: 24-652 and 3721-3811 which encodes the 
signal peptide. 

[0053] Another embodiment of the present invention 
is a purified nucleic acid encoding a polypeptide com- 
prising a sequence selected from the group consisting 
of the sequences of SEQ ID NOs: 4101-8177. 
[0054] Another embodiment of the present invention 
is a purified nucleic acid encoding a polypeptide com- 
prising a sequence selected from the group consisting 
of the sequences of SEQ ID NOs: 7798-7888. 
[0055] Another embodiment of the present invention 
is a purified nucleic acid encoding a polypeptide com- 
prising a mature protein included in a sequence selected 
from the group consisting of the sequences of SEQ ID 



NOs: 7798-7888. 

[0056] Another embodiment of the present invention 
is a purified nucleic acid encoding a polypeptide com- 
prising a signal peptide included in a sequence selected 
5 from the group consisting of the sequences of SEQ ID 
NOs: 4101-4729 and 7798-7883. 
[0057] Another embodiment of the present invention 
is a purified nucleic acid at least 15,18, 20. 23, 25, 28, 
30, 35, 40, 50, 75, 100, 200. 300, 500 or 1000 nucle- 
io otides in length which hybridizes under stringent condi- 
tions to a sequence selected from the group consisting 
of SEQ I D NOs: 24-4 1 00 and SEQ I D NOs: 8 1 78-3666 1 
and sequences complementary to the sequences of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36651. 
is [0058] Another embodiment of the present invention 
is a purified or isolated polypeptide comprising a se- 
quence selected from the group consisting of the se- 
quences of SEQ ID NOs: 4101-8177. 
[0059] Another embodiment of the present invention 
20 is a purified or isolated polypeptide comprising a se- 
quence selected from the group consisting of SEQ ID 
NOs: 7798-7888. 

[0060] Another embodiment of the present invention 
is a purified or isolated polypeptide comprising a mature 
25 protein of a polypeptide selected from the group con- 
sisting of SEQ ID NOs: 7798-7888. 
[0061] Another embodiment of the present invention 
is a purified or isolated polypeptide comprising a signal 
peptide of a sequence selected from the group consist - 
30 ing of the polypeptides of SEQ ID NOs: 41 01 -4729 and 
7798-7888. 

[0062] Another embodiment of the present invention 
is a purified or isolated polypeptide comprising at least 
10 consecutive amino acids of a sequence selected 
35 from the group consisting of the sequences of SEQ ID 
NOs: 4101-8177. 

[0063] Another embodiment of the present invention 
is a method of making a cDNA comprising the steps of 
contacting a collection of mRN A molecules from human 
40 cells with a primer comprising at least 15 consecutive 
nucleotides of a sequence selected from the group con- 
sisting of the sequences complementary to SEQ ID 
NOs; 24-4100 and SEQ ID NOs: 8178-36681, hybridiz- 
ing said primer to an mRNA in said collection that en- 
45 codes said protein reverse transcribing said hybridized 
primer to make a first cDNA strand from said mRNA. 
making a second cDNA strand complementary to said 
first cDNA strand and isolating the resulting cDNA en- 
coding said protein comprising said first cDNA strand 
so and said second cDNA strand. 

[0064] Another embodiment ol the present invention 
is a purified cDNA obtainable by the method of the pre- 
ceding paragraph. 

[0065] In one aspect of this embodiment, the cDNA 
encodes at least a portion of a human polypeptide. 
[0066] Another embodiment of the present invention 
is a method of making a cDNA comprising the steps of 
obtaining a cDNA comprising a sequence selected from 
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. ccn in NOs' 24-4100 and SEQ lU nus. 
Ldtons which permit said probe to hybndize to said 

Zi e *t ipast a Dortion of a human polypeptide. 

nntaAtea o! said mRNA, hybridizing sad first pnmer to 

. ^Trvto said first cDNA strand using at least one 
plementary to sa,d « si v9 mcleo ^ s 

Sb'ssssssss--'" 

?M701 Another embodiment of the present invention 
isa puritied cONA obtainable by the method o. the pre- 

'^aspect o, this embodiment. said cDNA 
1 toast a portion of a human polypeptide. 

second cDNA strand is made by contacting; aid ft n*cD 
NA strand with a first pair of primers. said first pair £ 

said first primer, performing a first polymerase chain ts 

S eonsecut- nucteotides o, said sequence 

r« saXTa S nd 8 ,il«h hybridize to seances 
saw first PCB product, and performing a second 
Slrase chain reaction, thereby generating a sec 

c^N Attainable by the method of the preceding para- 
ft tnanotheraspec.o.thisembodiment.saacD. 



„. ^^to- a , least a portion of a human polypeptide. 

the second cDNA stranc I -ybe 
made by contacting said firs. >=DNA strand «.th a second 
pTuner comprising at least 15 consecutrve r«WM 
. Tz sequence selected from the group ««9 ' 
°'„ ln , NOs . 24-4100 and SEQ ID NOs: 8178-36661, 
^ngiwsecondprimertosaidnrststrandcDNA. 

and extending said hybridized second pnmer to gene, 
ale said second cDN A strand. 
,o f00761 One aspr^ctoUheatove embedment ts a pu 
rifiedcDNA obtainable by the method o. me P reced,ng 

NA encodes at least a portion of a human polypeptide. 
,* P0T8] Aether embodiment of the present invention 
s a method of making a polypeptide comprising he 
ste ps of ob.ainingacDNAvvhichencodesapolypept.de 

'encoded by a nucleic acW comprising sequence se- 
lected from the group consisting of SEQ ID NOs. 
„ ^»«acONAv^-n=od»-pot7P^^ 
p,isingatl 8 ast10consecutiveam.noac^sofapolypep 
e encoded by a sequence selected from the group 
tide enco °" » 24-4100, inserting said cD- 

2S erabrytked to a promoter, introducing said expression 
Wo a host cell whereby said host eel produces 
*e protein encoded by said cDNA. and isolating said 

mm Another aspect of this embodiment is an tor> 
30 K proteinobtainable by the me.hodo. the preceding 

SEaT*AW** embodiment of the present invention 
s a method of obtaining a promoter DNA comprising the 
Seps of obtaining genomic DNA located 
as nucleic acid composing a sequence selected ^ the 
group consisting ol SEQ ID NOs: 24-4100 andSEOTO 
NOs 81 78-3668 1 and the sequences 
to *e sequences o. SEQ ID NOs: 24-4100 and SEQ ID 
N6s 8178-35681, screening said genomic DNA to 
« identify a promoter capable o. directing transcripts ,n- 

SSft DNA comprising said identrlied promoter. 

pec, o. this embodiment, said screening step compnses 
. inserting genomfc DNA Seated upst ream or a s quence 
selected Irom the group consisting ol SEQ ID NUS. 
24 41OT and SEQ ID NOs: 8178-366B1 and the se- 
auencTs complementary to SEQ ID NOs: 24-4100 and 
SEQ ID N<£ 8178-36681 into a promoter reporter vac- 
ss for For example, said screening step may comprise 
« n^inTmotils in genomic DNA '-ted "Pstream £ 
a seauence selected from the group cons.st,ng ol SEQ 
f D S4i00 and SEQ ID NOs: 8178-36681 and the 
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=5S=35S£s 

C r n . n mos 24-4100 and ShQ 1U n^s>- oi»« 
an °.!agme n is comprising a, leas, 15 consecu,*e nu- 

ed from the group consisting ol Stu iu 

1 ,L™.e readable medium having stored thereon 
l^T 7 Ino.her embodiment ot the present invention 



isacomputersystemcomprisingaprocessorandadata 
stor^T device 'wherein said data storage device has 

sisting ol a nucleic acid code ot SEQIC > NOs_ 24 4100 
s and 8178-36681 and a polypeptide code lol SEQ ID 
4101-8177. in one aspect of this embodiment the 
^p Ute r system lurther comprises a sequence , c om- 
SI and a data storage device having reterencfi a* 
fences s».ed thereon. For examp.e. the sequence 
,0 Srermaycompriseacomputer program which „v 

lemtrtner comprises an identifier which identilies fea- 

« IS" S Sr U :mbodiment o. the presence n«on 
s a method tor comparing a first sequence to a refer 
ence sequence wherein said first sequence ^elected 
Tot the group consisting of a nucle,c acid code of SE- 
SSmS? 24 4100 and 8178-36681 and a polypeptide 
* ^e ofsEQ ID NOs: 4101-8177 comprising the steps 
oweading id first sequence and said reference se- 
quence trough use of a computer program wh*h i com- 
pared sequences and determining differences between 
S « sequence and said reference sequence wah 
» falomputerprogram. '--^Ces efw^e 
iment said step of determining differences between tne 
S seqtenceand the reference sequence comprises 

S^rretrimen, o, the present invention 
» sTmethod lo, identifying a feature in a sequence se- 
^ from the croup consisting , of a nucfeicac 

3S ^a c^pSter program which identifies , = ^se- 
quences and identifying features .n sa.d sequence w,tn 

^rCtrem^imen, of the present invention 
^vlctoTSmprising a nude, acid according to any 

acids described above comprising the s^^W 8 
telly linking together the nucleotides m sa.d nuclei 

together the amino acids in said polypeptide. 
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[0094] Another embodiment of the present invention 
is a method of making any of the polypeptides described 
above wherein said polypeptides is 120 amino acids in 
length or less comprising the step of sequentially linking 
together the amino acids in said polypeptides. 

Brief Description of the Sequence Listing 



[0095] SEQ ID NOs: 1 , 3, 5, 7, 9. 1 1 . and 1 3 are full- 
length cDNAs prepared using the methods described 
herein. 

[0096] SEQ ID NOs: 2, 4, 6, 8, 10. 1 2, and 14 are the 
polypeptides encoded by the nucleic acids of SEQ ID 
NOs" 1, 3, 5, 7, 9, 11, and 13. 

[0097] SEQ ID NOs: 15. 16, 18, 19. 21 and 22 are 
primers whose use is described in the specification. 
[0098] SEQ ID NOs: 1 7. 20. and 23 are the sequences 
of nucleic acids containing transcription factor binding 
sites which were obtained as described below. 
[0099] SEQ I D NOs: 24-652 are nucleic acids having 
an incomplete ORF which encodes a signal peptide. As 
used herein, an "incomplete ORF" is an open reading 
frame in which a start codon has been identified but no 
stop codon has been identified. The locations of the in- 
complete ORFs and sequences encoding signal pep- 
tides are listed in the accompanying Sequence Listing. 
In addition, the von Heijne score of the signal peptide 
computed as described below is listed as the "score" in 
the accompanying Sequence Listing. The sequence of 
the signal-peptide is listed as "seq" in the accompanying 
Sequence Listing. The V in the signal peptide sequence 
indicates the location where proteolytic cleavage of the 
signal peptide occurs to generate a mature protein. 
[0100] SEQ ID NOs: 653-3720 are nucleic acids hav- 
ing an incomplete ORF in which no sequence encoding 
a signal peptide has been identified to date. However, it 
remains possible that subsequent analysis will identify 
a sequence encoding a signal peptide in these nucleic 
acids. The locations of the incomplete ORFs are listed 
in the accompanying Sequence Listing. 
[0101] SEQ ID NOs: 3721 -3811 are nucleic acids hav- 
ing a complete ORF which encodes a signal peptide. As 
used herein, a "complete ORF" is an open reading frame 
in which a start codon and a stop codon have been iden- 
tified. The locations of the complete ORFs and sequenc- 
es encoding signal peptides are listed in the accompa- 
nying Sequence Listing. In addition, the von Heijne 
score of the signal peptide computed as described be- 
low is listed as the "score" in the accompanying Se- 
quence Listing. The sequence of the signal-peptide is 
listed as "seq" in the accompanying Sequence Listing. 
The V in the signal peptide sequence indicates the lo- 
cation where proteolytic cleavage ol the signal peptide 
occurs to generate a mature protein. 
[0102] SEQ ID NOs: 3812-4100 are nucleic acids 
having a complete ORF in which no sequence encoding 
a signal peptide has been identified to date. However, it 
remains possible that subsequent analysis will identify 



a sequence encoding a signal peptide in these nucleic 
acids. The locations of the complete ORFs are listed in 
the accompanying Sequence Listing. 
[0103] SEQ ID NOs: 4101-4729 are "incomplete 
5 polypeptide sequences" which include a signal peptide. 
Incomplete polypeptide sequences" are polypeptide se- 
quences encoded by nucleic acids in which a*start co- 
don has been identified but no stop codon has been 
identified. These polypeptides are encoded by the nu- 
w cleic acids of SEQ ID NOs: 24-652. The location of the 
signal peptide is listed in the accompanying Sequence 
Listing. In addition, the von Heijne score of the signal 
peptide computed as described below is listed as the 
"score" in the accompanying Sequence Listing. The se- 
is quence of the signal-peptide is listed as "seq" in the ac- 
companying Sequence Listing. The V in the signal pep- 
tide sequence indicates the location where proteolytic 
cleavage of the signal peptide occurs to generate a ma- 
ture protein. 

20 [0104] SEQ ID NOs: 4730-7797 are incomplete 
polypeptide sequences in which no signal peptide has 
been identified to date. However, it remains possible 
that subsequent analysis will identify a signal peptide in 
these polypeptides. These polypeptides are encoded by 
25 the nucleic acids of SEQ ID NOs: 653-3720. 

[0105] SEQ ID NOs: 7798-7888 are "complete 
polypeptide sequences" which include a signal peptide. 
"Complete polypeptide sequences" are polypeptide se- 
quences encoded by nucleic acids in which a start co- 
30 don and a stop codon have been identified. These 
polypeptides are encoded by the nucleic acids of SEQ 
ID NOs: 3721-3811. The location of the signal peptide 
is listed in the accompanying Sequence Listing. In ad- 
dition, the von Heijne score of the signal peptide com- 
35 puted as described below is listed as the "score" in the 
accompanying Sequence Listing. The sequence of the 
signal-peptide is listed as "seq" in the accompanying 
Sequence Listing. The V" in the signal peptide sequence 
indicates the location where proteolytic cleavage of the 
40 signal peptide occurs to generate a mature protein. 
[0106] SEQ ID NOs: 7889-8177 are complete 
polypeptide sequences in which no signal peptide has 
been identified to date. However, it remains possible 
that subsequent analysis will identify a signal peptide in 
45 these polypeptides. These polypeptides are encoded by 
the nucleic acids of SEQ ID N0s.:38l2-4100. 
[0107] SEQ ID NOs: 8178-36681 are nucleic acid se- 
quences in which no open reading frame has been con- 
clusively identified to date. However, it remains possible 
50 subsequent analysis will identify an open reading frame 
in these nucleic acids. 

[0108] In the accompanying Sequence Listing, all in- 
stances of the symbol "n" in the nucleic acid sequences 
mean that the nucleotide can be adenine, guanine, cy- 
55 tosine or thymine. In some instances the polypeptide se- 
quences in the Sequence Listing contain the symbol 
"Xaa." These "Xaa" symbols indicate either (1 ) a residue 
which cannot be identified because of nucleotide se- 
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qU ence ambiguity or (2) a stop codon n .he deleted 
sequence where applicants believe one should no, art 
iUhe sequence were determined more accurately)^ 
some instances, several possible identities ol the un- 
^„„ amino acids may be suggested by the gene„c 
code. 

Rriftl Descrlr ' 1 "" " f lhB Praw'"9 5 

10109] Figure 1 summarizes the computer ar*rysis 
procedure (or obtaining consensus conl.gated ESTs^ 
Figure 2 is an analysis o. the 43 am.no term™, 
am no acids ot al, human SwissPro, proteins <o deter- 
mTne the .requency ol la,se poshes and false nega- 
tes using the techniques lor signal pept.de ,dent(,ca- 

^irC^^-tesmethodstormaKingextend- 

[OIlT^ura 4 provides a schematic description of 
Z promoters isolated and the way they are assembled 
with the corresponding 5' tags. 
To 131 Figure 5 describes the transition factor 
binding sites present in each ol these promoters. 

rw«..l.H neacrll I W- ' ""■ bodlment 

,. Genera, Methods for Obtaining S' ESTs derived 
from mRNAs with intact 5' ends 

[0114) in order to obtain the S' ESTs of the present 
nvention. mRNAs with intact 5' ends must be .obtained. 
Example 1 below describes the preparation of 5 ESTs. 

EXAMPLE 1 



Preparation of mRNA 

,0115] Total human RNAs or polyA- RNAs derived 
rom , 30 different tissues were respectively purchased 
Z LAB MO and CLONTECH and used to generate 
Tcml lories as described be,ow. The purchased 
RNA had been isolated from cells or tissues using acid 
ouanidium thtocyanate-phenoKhloroform extraction 
C„nisKi and Sacchi. An***, Bgmm 
162 156-159 1987). PolyA* RNA was isolated from to 
ta^ RNA LABIMo/by two passes o, oHgo dT « 
looraohv as described by Avw and Leder., Proc. Nan. 

USA 69.t408.U12. 19 72) in order to e„mi. 

nate ribosomal RNA. „„ h ,A» 
W116] The quality and the integrity of the potyA 
RNAs were checked. Northern blots hybridized with a 
olobin probe were usedtoconfirm that themRNAs were 
no, degraded. Contamination o. me polyA* mRNAs ^ by 
•bosomal sequences waschecked using 
and a probe derived from the sequence of the 28SrR 
NA Preparations of mRNAs with lessthar , 5%o rRNAs 
were used in library construction. To avo,d construct ng 
"braries with RNAs contaminated by exogenous se- 



quences (prokaryotic or lungal). the presence of bacte- 
rid 6S rirwsomal sequences or of two highly expressed 

fungal mRNAs was examined using PCR 

[0117] Following preparation of the mRNAs from va - 
s Ls tissues an oligonucleotide tag was 

.ached to the caps at the 5' ends of the mRNAs. The 
olig 0 nucleotidetaghadanEcoRlsitethe^,n,ofactete 
laterctoningprccedures.Followingattachmenoftheol- 
igonucleotide tag to the mRNA, the integrity of the mR- 
,o NA was examined by performing a Northern blot w,th 
200 to 500 ng of mRNA using a probe complementary 
to the oligonucleotide tag before performing the first 
strand synthesis described in Example 2. 

15 EXAMPLE 2 

rnrl . e r ,^« i Linn mRNA Templates Having Intact 
5' Ends 

20 [0118] For the mRNAs joined to oligonucleotide tags, 
first strand cDNA synthesis was performed using a re- 
verse transcriptase with random nonamers as > primers^ 
, n order to protect interna, EcoR, sites ,n the cDN A rorn 
digestion at later steps in the prccedure, methylated 
25 dCTP was used for firs, strand synthesis. After removal 
of RN Aby an alkaline hydrolysis, the first strand of cDNA 
was precipitated using isopropanol in orderto eliminate 
residual primers. 

[0119] The second strand of the cDNA was synthe- 
30 sized with a Klenow fragment using a primer corre- 
sponding to the 5'end of the ligated oligonuc,eot,de. 
Methylated dCTP was also used for second strand syn- 
thesis in order toprotect internal EcoRI s.tes ,n the cDNA 
from digestion during the cloning process. 
3S roT20] Following cDNA synthesis, the cDNAs were 
cloned into pBlueScript as described in Example 3 be- 
low. 



EXAMPLE 3 

40 r^nin ^lcDNAsdr ^-HfrnmrnRNAwithintactS-ends 
into BlueScript 

[0121] Following second strand synthesis, the ends 
45 of the cDNA were blunted with T4 DNA PO^erase (B, 
olabs) and the cDNA was digested with EcoRI. Since 
methylated dCTP was used during cDN A synthesis the 
EcoRI site present in the tag was the only hem.-metf, , - 
ated site, hence the only site susceptible to EcoRI d.- 
50 aestion The cDNA was then size tractionated us.ng ex- 
S chromatography (Ac A, Bioseora) and traces 
corresponding to cDNAs of more than 50 bp were 
pooled and ethanol precipitated. The cDNA was d rec- 
Anally cloned into the Smal and EcoRI ends of the 
55 phagemkJ pBlueScript vector (Stratagene). The l.gaUon 
mixture was elect roporated into bactena and propagat- 
ed under appropriate antibiotic selection. 
[0122] Clones containing the oligonucleotide tag at- 
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tached ware then selected as described in Example 4 
below. 

EXAMPLE 4 

o^inn nl Clone- 1 IT-1 ™ r nn C leotid9 Tag. 
Aiiarhad Thereto 

f0123l The plasmid DMAs containing 5' EST libraries 
™de as described above were purified (Q-ageny A 
£„* selection o« the tagged ctones was penned 
aTfollows Briefly, in this selection procedure, the plas 
m id DN A was converted to single stranded DMA using 
""neTendonuc.ease ol the phage H °" 
with an exonuclease (Chang et at.. Gene 127.95 8, 
?993) such as exonuclease 111 or T7 gene 6 exonucle- 
I'rTe resulting single stranded DNA 
fied using paramagnetic beads as described by Fry el 
a!B/ 0t «es,13:124-13 l ,1992.ln.h.sproced^ 
fhe single stranded DNA was hybridized w,lh a bion- 
nylaS oligonucleotide having a sequence «*«P»* 
inatihe 3' end ol the oligonucleotide tag. Clones n- 
du*g a sequence complement to the b»t,nylated 
Jl gonucleotide were captured by ™" ba ;" h 
s ueptavidin coated magnetic beads .ollowedb mag- 
net ' selection. Alter capture o. the posrtrve clones^he 
plasmid DNA was released Irom the magnetic beads 
adverted into double stranded DNA using a DNA 
noXn^ase such as the Thermosequenase obtained 

ed DNA was then electroporated into bactena The per- 
centage of posifrve clones having .he 5' tag oligonucl* 
oHde was estimated to typically rank between 90 and 

ordered in 384-microtiter plates (MTP). A copy ol the 
MTP was stored lor future needs. Then the library 
were transferred into 96 MTP and sequenced as de- 

scribed below. 



beled with the JOE. FAM. BOX and TAMRA dyes The 
dNTPs and ddNTPs used in the sequencing reactions 
were purchased Irom Boehringer. Sequencing buffer, 
reagent concentrations and cycling conditions were as 
s recommended by Amersham. 

f01271 Following the sequencing reaction, the sam- 
ples were precipitated with ethanol, resuspended.in for- 
mamide loading buffer, and loaded on a standard 4% 
acrylamide gel. Electrophoresis was performed lor 2^5 
,0 hours at 3000V on an ABI 377 sequencer, and the se- 
quence data were collected and analyzed using the ABI 
Prism DNA Sequencing Analysis Software, version 
2.1.2. 



EXAMPLE S 

glancing ol I n""" clones 

roi2Sl Plasmid inserts were first amplified by PCR on 
^IxMhermocyc.ers (PerWn-E.mer Applied B«*ys- 
lems Division. Foster City. CA), using standard SETA A 
and SETA-B primers (Cense. SA), AmpliTaqGold (Per- 
Mn Earner), dNTPs (Boehringer), butler and cydingcon- 
di^s as recommended by the Perkin-Elmer Corpora- 

{0T26] PCR products were then sequenced using au- 
Sc ABI P.L 377 spencers (Perkin E m^r). Se- 
quencing reactions wereperlormed using PE9600ther 
mocyclers with standard dye-pnmer chemisuy and 
Snosequenase (Amershan , Pharniacia BM 
The primers used were either T7 or 21 Ml 3 (ava.lable 
^ cle. SA, as appropriate. The primers were la- 



T5 EXAMPLE 6 

nhtnjnin « fqtq fmm Full-length cDNA libraries, 
Olininrri fr-rr '^act S' Ends 

20 [0128] Alternatively. 5'ESTs may be isolated f rom oth- 
er cDNA or genomic DNA libraries. Such cDNA or ge- 
nomic DNA libraries may be obtained from a comm^aal 
source or made using other techniques familiar to those 
skilled in the art. One example of such cDNA l.brary con- 
2S struction, a full-length cDNA library, is 

101291 PolyA+ RNAs are prepared and their quality 
checked as described in Example 1 . Then, the caps at 
the 5' ends of the polyA* RNAs are specf.cally |o.ned to 
an oligonucleotide tag. The oligonucleotide tag may 
so contain a restriction site such as Eco Rl to achate , 1 ur- 
ther subcloning procedures. Northern bating is then 
performed to check the size of mRNAs hav.ng he oli- 
gonucleotide tag attached thereto and to ensure that the 
mRNAs were actually tagged. 
35 [0130] First strand synthesis is subsequently carried 
out for mRNAs joined to the oligonucleotide tag as de- 
scribed in Example 2 above except that the random non- 
amers are replaced by an oligo-dT primer For instance, 
this oligo-dT primer may contain an mternal tag of 4 nu- 
40 cleotides which is different from one t.ssue to the other. 
Following second strand synthesis using a pnmer con- 
tained in the oligonucleotide tag attached to the 5 end 
0 1 rnRNA, the blunt ends of the obtained double strand- 
ed full-length DNAs are modified into cohesive ends to 
45 facilitate subcloning. For example, the ; ~t.es of 
full-length cDNAs may be modified to allow subcloning 
into the Eco Rl and Hind III sites of a Bluescnpt vectc, 
using the Eco Rl site of the oligonucleot.de Xag and the 
addiUon of a Hind III adaptor to the 3' end of full-length 

rai Sit' The full-length cDNAs are then separated into 
several fractions according to their sizes using tech- 
niques familiar to those skilled in the art. For example 
electrophoretic separation may be applied in order to 
55 yield 3 or 6 different fractions. Following gel extract.on 
and purification, the cDNA fractions are subcloned .nto 
appropriate vectors, such as Bluescript vectors trans- 
formed into competent bacteria and propagated under 
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appropriate antibiotic conditions. Subsequently, plas- 
mids containing tagged full-length cDNAs are pos.trvely 
selected as described in Example 4. 
r0132] The 5 1 end of full-length cDNAs isolated from 
such cDNA libraries may then be sequenced as de- 
scribed in Example 5 

II 2 Computer Analysis of the Isolated 5' ESTs: 
construction of NetGene™ and SignalTag™ 
databases 

[0133] The sequence data from the 42 cDNA libraries 
made as described above were transferred to a data- 
base, where quality control and validation steps were 
performed. Abase^aller, working using a Urnx system, 
automatically flagged suspect peaks, taking m to ac- 
count the shape of the peaks, the inter-peak resolution, 
and the noise level. The proprietary base^aller also per- 
formed an automatic trimming. Any stretch of 25 or few- 
er bases having more than 4 suspect peaks was con- 
sidered unreliable and was discarded. Sequences cor- 
responding to cloning vector or ligation oligonucleotides 
were automatically removed from the EST sequences. 
However, the resulting EST sequences may conta.n 1 
to 5 bases belonging to the above mentioned sequenc- 
es at their 5 1 end. If needed, these can easily be re- 
moved on a case to case basis. 
roi34] Following sequencing as described above, the 
sequences of the 5' ESTs were entered in NetGene 
a database for storage and manipulation as described 
below and as depicted in Figure 1 . Before searching the 
ESTs in the NetGene™ database for sequences of in- 
terest ESTs derived from mRNAs which were not of in- 
terest", such as endogenous or exogenous contami- 
nants, redundant sequences, small sequences, highly 
degenerate sequences, or repeated sequences were 
identified and eliminated from further consideration. 
[01351 In order to determine the accuracy of he se- 
quencing procedure as well as the efficiency of the 5 
selection described above, the anaryses described in 
Examples 7 and 8 respectively were performed on 
5'ESTs obtained from NetGene- database following 
the elimination of sequences which were not of interest. 

EXAMPLE 7 

Mrn ...^nt of Sequencing Armracy by Comparison 
to Known Sequences 

101361 To further determine the accuracy of the se- 
quencing procedure described in Example 5, the se- 
quences of NetGene- 5' ESTs derived from known se- 
quences were identified and compared to the original 
known sequences. First, a FASTA analyse with over- 
hangs shorter than 5 bp on both ends was conducted 
on the 5' ESTs to identify those matching an entry in the 
publichumanmRNAdatabase.TheeesSS'ESTswtKch 

matched a known human mRNA were then realigned 



with their cognate mRNA and dynamic programming 
was used to include substitutions, insertions, and dele- 
tions in the list of -errors' which would be recognized. 
Errors occurring in the last 10 bases of the 5' EST se- 
s quences were ignored to avoid the inclusion of spurious 
cloning sites in the analysis of sequencing accuracy. 
[01 37] This analysis revealed that the sequences in- 
corporated in the NETGENE™ database had an accu- 
racy of more than 99.5%. 

w 

EXAMPLE 8 



instinn nt Efficiency ^ S FST Selection 

15 [0138] To determine the efficiency at which the above 
selection procedures isolated 5" ESTs which included 
sequences close to the 5' end of the mRNAs from which 
they derived, the sequences of the ends of the 5' ESTs 
derived from the elongation factor 1 subunit a and ferritin 
zo heavy chain genes were compared to the known cDNA 
sequences of these genes. Since the transcription start 
sites of both genes are well characterized, they may be 
used to determine the percentage of derived 5' ESTs 
which included the authentic transcription start sites. 
25 [0139] Forbothgenes,morethan95%oftheobtamed 
5' ESTs actually included sequences close to or up- 
stream of the 5' end of the corresponding mRNAs. 
[0140] To extend the analysis of the reliability of the 
procedures for isolating 5' ESTs from ESTs in the Net- 
30 Gene™ database, a similar analysis was conducted us- 
ing a database composed of human mRNA sequences 
extracted from GenBank database release 97 for com- 
parison The 5' ends of more than 85% of 5' ESTs de- 
rived from mRNAs included in the GeneBank database 
35 were located close to the 5' ends of the known se- 
quence. As some of the mRNA sequences available in 
the GenBank database are deduced from genomic se- 
quences a 5' end matching with these sequences will 
be counted as an internal match. Thus, the method used 
40 here underestimates the yield of ESTs including the au- 
thentic 5' ends ol their corresponding mRNAs. 

EXAMPLE 9 

45 Clustering of the 5' ESTs 

[0141] Since the cDNA libraries made above include 
multiple 5' ESTs derived from the same mRNA, overlap- 
ping 5'ESTs may be assembled into continuous se- 
so quences. The following method (see Figure 1 ) describes 
how to efficiently cluster 5'ESTs in order to yield not only 
consensus 5'EST sequences for mRNAs derived from 
different genes but also consensus 5'EST sequences 
for different mRNAs, so called variants, transcribed from 
55 me same gene such as alternatively spliced mRNAs. 
This clustering was performed on a set of NetGene 
5'ESTs sequences following elimination of endogenous 
contaminants, elimination of uninlormative sequences 
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and masking of repeats. 

[0142] The whole set of sequences was first part. 
Led into smaller sets, so-called clusters co« 
sequences exhibiting perfect riches w, h each «hm 
on a 9*en length. Such clusters contains ESTs ; dewed 
from a small number of different genes. Some 5 EST se 
P uences were not clustered using approach e the 
because they were not homologous to any other se 
quence or because the homology was not P^yd»" 
tected. To overcome this problem, sequences not clus- 
tered, so called singletons, may be compared to the con- 
sensus contigated ESTs obtained later on and. rtnec- 
essary. included in the approve clusters and used to 
compute other consensus contigated ESTs. 
r 0 143] Thereafter, all variants of a given gene were 
Untitled in each Custer as follows Overlapp.ng se- 
quences inside a given cluster were figured as onented 
graphs where each sequence was a node and each 
overlap an edge. Then, the different gene, > conta ned 
within a single graph which 

ent connex components were .dent.fied and isolated 
from each other. Subsequently, the different vaunts of 
a same gene were isoteted using an algorithm based on 
the detection of forks wrthin a connex component. If de- 
sired, the consensus contigated EST sequences .may 
be verified by identifying clones in nucle.c acid samples 
derived from biological tissues, such as cDNA library 
which hybridize to the probes based on the sequences 
of the consensus contigated ESTs and sequencing 

TO144] Overlapping 5'EST sequences belonging to 
h same variant as well as included 5'EST sequence 
belonging to the same cluster were then conUgated i and 
consensus contigated 5'EST sequences wem^emt- 
ed for each variant. Some of the obtamed consensus 
contigated 5'EST sequences were .ncompletedue to 
the fact that only included and overlapping 5 EST se- 
quences were considered to isotete genes and due to 
L algorithm developed to find variants. These vanant 
consensuscontigated5'ESTsequenceswere extended 
as follows. Variants transcribed from the same gene 
were compared pairwise and the 5' EST consensus se- 
quences that were incomplete e.ther in 5 and/or -m 3 
were extended with the appropriate sequence from the 
other variants. All 5' EST consensus sequences even- 
t ally completed in 5' or 3' from each *>M™u£ 
sequentlycomparedtothewholesetof.ndrv.dualSEST 

sequences obtained for this cluster. 
EXAMPLE 10 

li inntifi— " f ,h " M ™* Pfobabl ° Qp6n R6adiQg - 
Frame of 5' ESTs 

roi 45] Subsequently, the most probable coding open 
reading frame (ORF) may be determined tor each con- 
sensus assembled 5'EST or 5'EST as follows 
[oTSJ Eachnucleicacidsequenceisfirstdivdedmto 



several subsequences which coding propensity is ^ eval- 
uated using different methods known to those skilled in 
the art such as the evaluation of N-mer frequency and 
its variants (Fickett and Tung, Nucleic Acids Res;20: 
s 6441-50 (1992)) or the Average Mutual Information 
method (Grosse ef al, International Conference on In- 
telligent Systems for Molecular Biology. Montreal. Can- 
ada June 28-Juty 1 . 1 993). Each of the scores obtained 
by the techniques described above are then normalized 
,o by their distribution extremities and then fused using a 
neural network into a unique score that represents the 
coding probability of a given subsequence. 
[0147] The coding probability scores obtained for 
each subsequence, thus the probability score profiles 
15 obtained for each reading frame, are then linked to the 
initiation codons present on the sequence. For each 
open reading frame, defined as a nucleic acid sequence 
of at least 50 nucleotides beginning with an ATG codon. 
an ORF score is determined. Basically, this score is the 
20 sum of the probability scores computed for each subse- 
quence corresponding to the considered ORF in the cor- 
rect reading frame corrected by a function that negative- 
ly ponderates locally high score values and pos.tively 
ponderates sustained high score values. The chosen 
25 ORF is the one with the highest score. 

,01481 Two kinds of ORFs are considered. In some 
embodiments. 5'ESTs encoding ORFs ol at least 50 
amino acids extending up to the end of the consensus 
assembled 5'EST sequences are obtained. In other em- 
30 bodiments. 5'ESTs encoding complete ORFs. namely 
ORFs with start and stop codons. containing at least 1 00 
amino acids are obtained. 



EXAMPLE 11 

35 

seq uence Analysis 

T01 491 Application of the clustering method described 
In Example 9 to a selected set of 126.735 NetGene™ 
40 5'ESTs free from endogenous contaminants and un.n- 
(ormative sequences yielded 9490 consensus assem- 
bled 5'EST sequences or variants for a total of 8037 
genes clustered representing 98.973 individual 5'ESTs^ 
One of them which contained 21,138 sequences and 
45 was shown to contain chimeras thanks to comparison 
to public sequences was removed from further analys^ 
roi50] Both non clustered 5'ESTs, i.e. singletons, and 
consensus contigated 5'ESTs were then compared to 
already known sequences as follows. Those sequences 
50 matching human mRNA sequences were eliminated 
from further analysis. Then, following masking of re- 
peats those sequences matching sequences that have 
already been discovered by the inventors, namely se- 
quences exhibiting more than 90% homology over 
55 stretches longer than 40 nucleotides using BLAST 2 N 
with overhangs shorter than 10 nucleotides, were re- 
moved from further consideration. The final set repre- 
sents the sequences of the invention (SEQ ID NOs: 
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24-4100 and B178-36681). i.e., 7609 consensus conti- 
gated S'EST from 6398 clusters containing 31.267 
5'ESTs and 24. 972 singletons. 
f01 51] Of the 6398 obtained clusters, 658 were shown 
to be multivariant. i.e. to contain several variants of the 
same gene. Table I gives for each of the mult.vanant 
clusters named by its internal reference (first column) 
the list of the consensus sequences of a» ««JJ^ « h 
variant being represented by a different SEQ ID NO. 
T01 52] Subsequently, the most probable open reading 
rame was determined, as described in Example t\Q for 
a,« sequences of the invention. 3.697 5'ESTs SEQ ID 
NOs 24-3720) encoding incomplete ORFs (bbu iu 
NOs-4101-7797) of at least 50 amino acid long were 

encoding complete ORFs (SEQ ID NOs:7798-8 1 77) of 
at least 100 amino acids were found. 
T01 53] The nucleotide sequences of the SEQ ID NOs. 
24-4100 and 81 78-36681 and the amino acid sequenc- 
es encoded by SEQ ID NOs: 24-4100 (i.e. amino acid 
sequences of SEQ ID NOs: 4101-8177) are prov.ded in 
the appended sequence listing. Some of the amino acid 
sequences may contain 'Xaa' designators. These Xaa 
designators indicate either (1) a residue which cannot 
be identified because of nucleotide sequence ambiguity 
or (2) a stop codon in the determined sequence where 
applicants believe one should not exist (if the sequence 
were determined more accurately). 
[01 54] If one of the nucleic acid sequences of SEQ iu 
NOs- 24-4100 and 8178-36681 are suspected of con- 
taining one or more incorrect or ambiguous nucleotides, 
the ambiguities can readily be resolved by resequenc.ng 
a fragment containing the nucleotides to be evaluated. 
II one or more incorrect or ambiguous nucleotides are 
detected, the corrected sequences should be «ic uded 
in the clusters from which the sequences were isolated, 
and used to compute other consensus cont.gated se- 
quences on which other ORFs would be identified. Nu- 
cleic acid fragments for resolving sequencing errors or 
ambiguities may be obtained from deposited clones or 
can be isolated using the techniques descr.bed herem. 
Resolution of any such ambiguities or errors may be fa- 
cilitated by using primers which hybridize to sequences 
located close to the ambiguous or erroneous sequenc- 
es For example, the primers may hybridize to sequenc- 
es within 50-75 bases of the ambiguity or error. Upon 
resolution of an error or ambiguity, the corresponds 
corrections can be made in the protein sequences en- 
coded by the DN A containing the error or ambiguity. The 
amino acid sequence of the protein encoded by a par- 
ticular clone can also be determined by expression of 
the clone in a suitable host cell, collecting the protem, 
and determining its sequence. 
[01 55] In addition, if one of the sequences of SEQ ID 
NOs- 4101-8177 is suspected of containing an truncat- 
ed ORF as the result of a frameshift in the sequence, 
such f rameshifting errors may be corrected by combin- 
ing the following two approaches. The first one involves 



thorough examination of all double predictions, i.e. all 
cases where the probability scores tor two ORFs located 
on different reading frames are high and close, prefer- 
ably different by less than 0.4. The fine examination of 
s the region where the two possible ORFs overlap may 
help to detect the frameshift. In the second approach 
homologies with known proteins are used to correct sus- 
pected frameshifts. 



io EXAMPLE 12 

iHpntifination of Potential Signal Sequ ences in 5' ESTs 

[0156] The amino acid sequences of SEQ ID NOs: 
is 41 01 -81 77 were then searched to identify potential sig- 
nal motifs using slight modifications of the procedures 
disclosed in Von Heijne. Nucleic Acids Res. 14: 
4683-4690, 1 986. Those sequences encoding a 1 5 ami- 
no acid long stretch with a score of at least 3.5 in the 
20 Von Heijne signal peptide identification matrix were con- 
sidered to possess a signal sequence and were includ- 
ed in a database called SIGNALTAG™. 
[0157] The sequences of the 720 nucleic acid se- 
quences containing a signal sequence (SEQ ID NOs: 
25 24-652 and 3721 -381 1 ) and the corresponding polypep- 
tides with a potential signal peptide (SEQ ID NO: 
4101-4729 and 7798-7888) are provided in the Se- 
quence Listing appended hereto. The signal peptides of 
such polypeptides are indicated as features in the ap- 
30 pended Sequence Listing. It should be noted that, in ac- 
cordance with the regulations governing Sequence List- 
ings, in the appended Sequence Listing, the full protein 
(i e. 'the protein containing the signal peptide and the 
mature protein) extends from an amino acid residue 
35 having a negative number through a positively num- 
bered C-terminal amino acid residue. Thus, the first ami- 
no acid of the mature protein resulting from cleavage of 
the signal peptide is designated as amino acid number 
1. and the first amino acid of the signal peptide is des- 
40 ignated with the appropriate negative number. 

[0158] To confirm the accuracy of the above method 
for identifying signal sequences, the analysis of Exam- 
ple 13 was performed. 

45 EXAMPLE 13 

confirmation of Accuracy of Identi fication of Potential 
Sig nal Sequences in 5' ESTs 

so [01 59] The accuracy of the above procedure for iden- 
tifying signal sequences encoding signal peptides was 
evaluated by applying the method to the 43 amino acids 
located at the N terminus of all human SwissProt pro- 
teins. The computed Von Heijne score for each protein 
55 was compared with the known characterization of the 
protein as being a secreted protein or a non-secreted 
protein/In this manner, the number of non-secreted pro- 
teins having a score higher than 3.5 (false positives) and 
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the number of secreted proteins having a score lower 
than 3.5 (false negatives) could be calculated. 
[0160] Using the results of the above analysis, the 
probability that a peptide encoded by the 5' region of the 
mRNA is in fact a genuine signal peptide based on its 
Von Heijne's score was calculated based on either the 
assumption that 10% of human proteins are secreted or 
the assumption that 20% of human proteins are secret- 
ed. The results of this analysis are shown in Figure 2. 
[0161] Using the above method of identification of se- 
cretory proteins, 5' ESTs of the following polypeptides 
known to be secreted were obtained: human glucagon, 
gamma interferon induced monokine precursor, secret- 
ed cyclophilin-like protein, human pleiotropin. and hu- 
man biotinidase precursor. Thus, the above method 
successfully identified those 5' ESTs which encode a 
signal peptide. 

[0162] To confirm that the signal peptide encoded by 
the 5' ESTs or contigated consensus 5' ESTs actually 
functions as a signal peptide, the signal sequences from 
the 5' ESTs or consensus 5' ESTs may be cloned into a 
vector designed for the identification of signal peptides. 
Such vectors are designed to conler the ability to grow 
in selective medium only to host cells containing a vector 
with an operably linked signal sequence. For example, 
to confirm that a 5' EST or consensus 5' EST encodes 
a genuine signal peptide, the signal sequence of the 5' 
EST or consensus 5' EST may be inserted upstream 
and in frame with a non-secreted form of the yeast m- 
vertase gene in signal peptide selection vectors such as 
those described in U.S. Patent No. 5,536.637. Growth 
of host cells containing signal sequence selection vec- 
tors with the correctly inserted 5" EST or consensus 5* 
EST signal sequence confirms that the 5" EST or con- 
sensus 5' ESTs encodes a genuine signal peptide. 
[01 63] Alternatively, the presence of a signal peptide 
may be confirmed by cloning the extended cDNAs ob- 
tained using the ESTs or consensus 5' ESTs into expres- 
sion vectors such as pXT1 as described below, or by 
constructing promoter-signal sequence-reporter gene 
vectors which encode fusion proteins between the sig- 
nal peptide and an assayable reporter protein. After in- 
troduction ol these vectors into a suitable host cell, such 
as COS cells or NIH 3T3 cells, the growth medium may 
be harvested and analyzed for the presence of the se- 
creted protein. The medium from these cells is com- 
pared to the medium from control cells containing vec- 
tors lacking the signal sequence or extended cDNA in- 
sert to identify vectors which encode a functional signal 
peptide or an authentic secreted protein. 

EXAMPLE 14 

Amassment of the nov elty rate of 5'ESTs 

[0164] To assess the yield of new sequences, the ob- 
tained 5'ESTs and consensus contigated 5'ESTs were 
compared to all known human mRNAs extracted Irom 



the EMBL release 57 and daily updates available at the 
time of filing. The comparison was performed using 
BLAST2N on both strands following masking of the re- 
peats. Sequences having more than 95% homology 
s with public sequences over their whole length with at 
most 10 nucleotide overhangs on each extremity were 
considered as previously identified. Thus, about 90% of 
5'ESTs or consensus assembled 5'ESTs were consid- 
ered unidentified. 

w 

II. 3. Evaluation of Spatial and Temporal Expression 
of mRNAs Corresponding to the 5'ESTs or Extended 
cDNAs 

15 [0165] Each of the SEQ ID NOs: 24-4100 and 
8178-36681 was also categorized based on the tissue 
from which its corresponding mRNA was obtained, as 
described below in Example 1 5. 



20 EXAMPLE 15 



Expression Patterns of mRNAs From Which the 5'ESTs 
were obtained 



25 [0166] Table II shows the spatial distribution of each 
of the 5'ESTs (non-clustered ESTs) and of each consen- 
sus contigated ESTs respectively. Table II provides the 
SEQ ID NOs: of the 5' ESTs (referred to alternatively 
herein as non^lustered ESTs or singletons) and con- 
so sensus contigated ESTs. Table II also lists the number 
of ESTs from each type of tissue which were used to 
assemble the contigated consensus ESTs. The SEQ ID 
NOs: in Table II which contain a single 5' EST from a 
single tissue are 5' ESTs. Each type of tissue listed in 
35 Table II is encoded by a letter. The correspondence be- 
tween the letter code and the tissue type is given in Table 
111. For example, the consensus contigated EST of SEQ 
ID NO: 47 contains one 5'EST from cancerous prostate, 
two 5'ESTs from lymph ganglia, and two 5'ESTs from 
40 testes. 

[01 67] In addition to categorizing the 5' ESTs and con- 
sensus contigated 5' ESTs with respect to their tissue of 
origin, the spatial and temporal expression patterns of 
the mRNAs corresponding to the 5' ESTs and consen- 
ts sus contigated 5' ESTs, as welt as their expression lev- 
els, may be determined as described in Example 16 be- 
low. 

[0168] Characterization of the spatial and temporal 
expression patterns and expression levels of these mR- 
so NAs is useful for constructing expression vectors capa- 
ble ol producing a desired level of gene product in a de- 
sired spatial or temporal manner, as will be discussed 
in more detail below. 

[0169] Furthermore, 5' ESTs and consensus contigat- 
55 ed 5* ESTs whose corresponding mRNAs are associat- 
ed with disease states may also be identified. For ex- 
ample, a particular disease may result from the lack of 
expression, over expression, or under expression of a 
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tn a «y EST or consensus conti- 

quences adjacent to the 5 Eb is anu u 

Led 5' ESTs. It will also be apprecated that .1 desireo 

EXAMPLE 16 

mRMAs cofre ^p"^'"Q to ESI -hieiaieuj 



«. t'h« btattn-UTP modification enables capture of 



No 2 305 2-»l A. in this method. cDNAs are prepared 
Lfa cel.. .issue, organism or other source dh .ucec 
add lor which gene expression patterns must be de £ 
mined. The resulting cDNAs are separated mto Mo 
s Lis The cDNAs in each pool are cleaved with a tost 
Salon endonuclease. called an anchoring enzyme 
hav ng a recognition site which is likely to be present at 
L^ein^stcDNAs.The.ragments which ccntar, 
5e1"» most region of the cleaved cDNA are isolated 
,o by binding to a capture medium such as streptav d* 
coated beads. A first oligonucleotide linker having a firs, 
Xence «or hybridization o. an ^^J^T 
and an internal restriction site lor a sc , ca«ec I tagg ng 
endonuclease is ligated to the digested cDNAs the 
,s f„s. pool. Digestif with the second endonuclease pro- 
duces short tag iragments (rom the cDNAs. 
t0173l A second oligonucleotide having a second se- 
quence lor hybridizatton of an ampU.ica.ion prime, am) 
an interna, restriction site is ligated to the digested cD- 
„ NAsin.hesecondpool.ThecDNA.ragmen.sin^es-- 

ond pool are also diges.ed with the tagging endonueto 
ase lo generate short tag Iragments derived from me 
cDNAs in .he second pool. The lags resulting from * 
geslion of me firs, and second pools *Kh the anchoring 
* enzyme and .he .agging endonuclease are hgaM no 
one another to produce so called drtags. n ~ 
bodiments, the ditags are concatamenzed to j produce 
toticn products containing from 2 to 200 *ag*Th. 
Z sequences are then de.ermhed and con**«n° 
so the sequencesof the EST-related nuclei acid, fragment 
Jan EST related nucleic acid, positional segment of an 
EST-reLed nucleic acid, or .ragmen, of a positional 
segment ol an EST-related nucleic acid to determ ne 
which 5' ESTs, contigated consensus 5 ESTs, or ex 
3S Tnded cDNAs are expressed in the cell, tissue, organ- 
:mo?oCsourceo.nuc,eicackls.romwh fc hthe.ags 

^derived. In.his way. .he expression pattemo. .he 
5' ESTs, con.iga.ed consensus 5' ESTs, or extended cD- 
NAs in the ceil tissue, organism, or other source of nu- 

Sso be performed using arrays. As "sed herem^e 
tern array means a one dimensional, two d,mens-onal 

« acids, fragments of EST related nucleic acids, posmonal 
segmentsW-relatednucIek: acids. « 
sitional segments o. EST-related nucleic wfe Prefer 
ably the EST-related nucleic adds, fragments of EST 
re,a ed nucleic acids, positional segments EST-related 
so nucleic acids, or tragments of 

EST-related nucleic acids are a. least 1 5 
tenqth More preferably, the EST-related nucleic acids, 
raqments o. EST related nucleic acids, positional seg- 
S EST-related nucleic acids, or tragments of pos,- 
» Zl segmentso. EST-rela.ed nucleic acids are a. leas, 
100 nucleotide long. More preferably, .he fragments^ 
more than .00 nucleotides in length, n some emba- 
lms, the EST-related nucleic acids, tragments of EST 
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related nucleic acids, positional segments EST-related 
nucleic acids, or fragments of positional segments of 
EST-related nucleic acids may be more than 500 nucle- 
otides long. 

[01 75] For example, quantitative analysis of gene ex- 
pression may be performed with EST-related nucleic ac- 
ids, fragments of EST related nucleic acids, positional 
segments EST-related nucleic acids, or fragments of po- 
sitional segments of EST-related nucleic acids in a com- 
plementary DNA microarray as described by Schena at 
ai (Science 270:467-470, 1995; Proc. Natl. Acad. Sci. 
U.S.A. 93:10614-10619. 1996). EST-related nucleic ac- 
ids, fragments ol EST related nucleic acids, positional 
segments EST-related nucleic acids, or fragments of po- 
sitional segments of EST-related nucleic acids are am- 
plified by PCR and arrayed from 96-well microliter plates 
onto silylated microscope slides using high-speed ro- 
. botics. Printed arrays are incubated in a humid chamber 
to allow rehydration of the array elements and rinsed, 
once in 0.2% SDS for 1 min, twice in water for 1 min and 
once for 5 min in sodium borohydride solution. The ar- 
rays are submerged in water for 2 min at 95°C, trans- 
ferred into 0.2% SDS for 1 min, rinsed twice with water, 
air dried and stored in the dark at 25°C. 
[01 76] Cell or tissue mRNA is isolated or commercial- 
ly obtained and probes are prepared by a single round 
of reverse transcription. Probes are hybridized to 1 cm 2 
microarrays under a 1 4 x 1 4 mm glass coverslip for 6-1 2 
hours at 60°C. Arrays are washed for 5 min at 25°C in 
low stringency wash buffer (1 x SSC/0.2% SDS). then 
for 10 min at room temperature in high stringency wash 
buffer {0. 1 x SSC/0.2% SDS). Arrays are scanned in 0.1 
x SSC using a fluorescence laser scanning device fitted 
with a custom filter set. Accurate differential expression 
measurements are obtained by taking the average of 
the ratios of two independent hybridizations. 
[0177] Quantitative analysis of the expression of 
genes may also be performed with EST-related nucleic 
acids, fragments of EST related nucleic acids, positional 
segments EST-related nucleic acids, or fragments of po- 
sitional segments of EST-related nucleic acids in com- 
plementary DNA arrays as described by Pietu et at. {Ge- 
nome Research*. 492-503. 1996). The EST-related nu- 
cleic acids, fragments of EST related nucleic acids, po- 
sitional segments EST-related nucleic acids, or frag- 
ments of positional segments of EST-related nucleic ac- 
ids thereof are PCR amplified and spotted on mem- 
branes. Then, mRNAs originating from various tissues 
or cells are labeled with radioactive nucleotides. After 
hybridization and washing in controlled conditions, the 
hybridized mRNAs are detected by phospho-imaging or 
autoradiography. Duplicate experiments are performed 
and a quantitative analysis of differentially expressed 
mRNAs is then performed. 

[0178] Alternatively, expression analysis of the EST- 
related nucleic acids, fragments of EST related nucleic 
acids, positional segments EST-related nucleic acids, or 
fragments of positional segments of E ST-related nucleic 



acids can be done through high density nucleotide ar- 
rays as described by Lockhart et ai {Nature Biotechnol- 
ogy 14: 1675-1680, 1996) and Sosnowsky etal. {Proc. 
Natl. Acad. Sci 94:1119-1123. 1997). Oligonucleotides 
5 of 15-50 nucleotides corresponding to sequences of 
EST-related nucleic acids, fragments of EST rejated nu- 
cleic acids, positional segments EST-related nucleic ac- 
ids, or fragments of positional segments of EST-related 
nucleic acids are synthesized directly on the chip (Lock- 
10 hart ef ai, supra) or synthesized and then addressed to 
the chip (Sosnowsky et ai, supra). Preferably, the oli- 
gonucleotides are about 20 nucleotides in length. 
[0179] cDNA probes labeled with an appropriate com- 
pound, such as biotin, digoxigenin or fluorescent dye, 
is are synthesized from the appropriate mRNA population 
and then randomly fragmented to an average size of 50 
to 100 nucleotides. The said probes are then hybridized 
to the chip. After washing as described in Lockhart et ai. 
supra and application of different electric fields 
20 (Sonowsky et at, supra.), the dyes or labeling com- 
pounds are detected and quantified. Duplicate hybridi- 
zations are performed. Comparative analysis of the in- 
tensity of the signal originating from cDNA probes on 
the same target oligonucleotide in different cDNA sam- 
25 pies indicates a differential expression of the mRNA cor- 
responding to the 5* EST, consensus contigated 5' EST 
or extended cDNA from which the oligonucleotide se- 
quence has been designed. 

30 HI. Use of 5' ESTs to Clone Extended cDNAs and to 
Clone the Corresponding Genomic DNAs 

[01 80] Once 5' ESTs or consensus contigated 5' ESTs 
which include the 5' end of the corresponding mRNAs 
3$ have been selected using the procedures described 
above, they can be utilized to isolate extended cDNAs 
which contain sequences adjacent to the 5' ESTs or con- 
tigated consensus 5' ESTs. The extended cDNAs may 
include the entire coding sequence of the protein encod- 
40 ed by the corresponding mRNA, including the authentic 
translation start site. If the extended cDNA encodes a 
secreted protein, it may contain the signal sequence, 
and the sequence encoding the mature protein remain- 
ing after cleavage of the signal peptide. Extended cD- 
4$ NAs which include the entire coding sequence of the 
protein encoded by the corresponding mRNA are re- 
ferred to herein as full-length cDNAs." Alternatively, the 
extended cDNAs may not include the entire coding se- 
quence ol the protein encoded by the corresponding 
so mRNA, although they do include sequences adjacent to 
the 5'ESTs or contigated consensus 5' ESTs. In some 
embodiments in which the extended cDNAs are derived 
from an mRNA encoding a secreted protein, the extend- 
ed cDNAs may include only the sequence encoding the 
55 mature protein remaining after cleavage of the signal 
peptide, or only the sequence encoding the signal pep- 
tide. 

[0181] Example 17 below describes a general method 
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for obtaining extended cDNAs using 5' ESTs or consen- 
irc^gated 5' ESTs. Exampte 28 bjMjJ 
the cloning and sequencing of several extended cDNAs, 

Tg sequence and authentic 5' end of the correspondmg 
mRNA lor several secreted proteins. 
10182] The methods of Examples 17 and IB can also 

SsSu'sin^ese methods encode at leas, 
7,0 15 20 25 30, 35. 40, 50.75, 100, or 150consec- 

«f ccn in NOs- 24-4100 and 8178-36681 

using these methods encode at least 5. 10, 15. 20, 25 
30.1. 40. 50, 75. 100, or 150 consecutrve «ino»d. 
oloneoftheproteins encoded by the sequences ol SEQ 
ID NOs: 24-4100. 

EXAMPLE 17 

1 1, I -Hrr^-T^STsto Clone and 

ilfc^didcD!^^ 

f nrrQC pr»nriinq mRNA 

,01831 The following general method has been used 
rq^Klyandefficient^soia.eex.endedcDNAs.nc.ud^ 

ing sequence adjacent to the sequences ol the 5 ESTs 
used to obtain them. This method may be applied to db- 
"ain extended cDNAs lor any 5' EST or consensus con- 

taated S EST o( the invention, including those S ESTs 
^consensus contigated S ESTs jsecreted 

proteins. This method is summarized in Figure 3. 

1 nhtrfr"1 g"'°"ried cDNAs 



b) Second sttand synthesis 

[0186] A pair ot nested primers on each end is d* 
signed based on the known 5' sequence from the 5 EST 
5 or contigated consensus 5' EST and the known 3 end 
adSed by the po* dT primer used in the firs, strands^ 
thesis. Sottwareusedtodesign primers areenherb^sed 

on GC content and melting temperatures of okgonuc^ 
otides, suchasOSP (lllier and Green, PCR Meth. Apfi 
,o V124-128 1991). or based on the octamer Irequeno, 
disparity method (G.iltais etal., Nucleic Ack) S Re*1 a 
3837-3891, 1991 such as PC-Rare (http.//bio.nlormat- 
ics.weizmann.ac.il/sottware/PCRare/doc/manuel. 

.s Joitm Preterably.thenestedprimersattheS'endand 
me nested primers at the 3' end are separated Irom one 
another by four to nine bases. These pnmer sequences 
may be selected to have melting temperatures and spe- 
cincities suitable for use in PCR. 
20 [0 188 ] A first PCR run is performed us,ng the , out« 
prime from each of the nested pairs. A second PCR run 
fs performedusing.be same enzyme andthe,nner P r,m. 

er from each ot the nested pairs is then 
small sample of the first PCR product. Thereafter, the 
25 primers and remaining nucleotide monomers are re- 
moved. 

„ „ ntT ^ ». en I enalh Fxtended cDNAs or. 

Frag ments Thereof 
30 101891 Due to the lack of position constraints on the 
design o. 5' nested primers compatible lor PCR use us- 
ing the OSP software, amplicons of two types are ob- 
taLd. Preferably, the second 5' primer is located up- 
3S stream of the translation initiation codon thus yielding a 
nested PCR product containing the entire coding se- 
quence Such a full length extended cDN A may be used 
in a direct cloning procedure. However, in *™ cm* 
me second 5' primer is located downstream of the trans- 
„ Son initiation codon, thereby yie.ding a P™*«g 
containing only part of the ORF. Such incomplete PCR 
products are submitted to a modified procedure de- 
scribed in section b below. 



a) First strand synthesis 

,01841 The method takes advantage of the known 5' 
ierenceolthemRNA.Areversetranscript,onreact,on 

is conducted on purified mRNA with a PfJ^T 4S a) Nestad PC R products containing complete ORFs 
containing a nucleotide sequence at its 5 end I slowing / 

which corresponds to the 3' end of the mRNA. sucn a 
orimer and a commercially-available reverse Iran- 
Sase enzyme are added to a buffered mRNA urn- 

site ot the RNAs. Nucleotide monomers are men added 
to complete the first strand synthesis. 
01851 After removal of the mRNA hybridized to the 
Sna strand by alkaline hydrolysis, the products of 
Alkaline hydrolysis and the residua. poV dT pnmer 
can be eliminated with an exclusion column. 



,0190] When the resulting nested PCR product con- 
Ls the complete coding sequence, as predicted from 
the 5 EST or consensus contigated 5' EST sequence, 
so is cloned in an appropriate vector. 

b) Nested PCR products containing incomplete ORFs 



,01911 When the amplicon does not contain the com- 
55 plete coding sequence, intermediate steps are neces- 
^ary to obtain both the complete coding sequence and 
a PCR product containing the lull codmg sequence. The 
complete coding sequence can be assembled from sev- 
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eral partial sequences determined directly from diflerent 
PCR products. 

[0192] Once the full coding sequence has been com- 
pletely determined, new primers compatible lor PCR 
use are then designed to obtain amplicons containing 
the whole coding region. However, in such cases, 3' 
primers compatible for PCR use are located inside the 
3' UTR of the corresponding mRNA, thus yielding am- 
plicons which lack part of this region, i.e. the polyA tract 
and sometimes the polyadenylation signal, as illustrated 
in Figure 3. Such full length extended cDNAs are then 
cloned into an appropriate vector. 

c) Sequencing extended cDNAs 

[0193] Sequencing of extended cDNAs can be per- 
formed using a Die Terminator approach with the Ampl- 
iTaq DNA polymerase FS kit available from Perkin Elm- 
er. 

[0194] In order to sequence PCR fragments, primer 
walking is performed using software such as OSP to 
choose primers and automated computer software such 
as ASMG (Sutton at ah, Genome Science Techno!. 1: 
9-19, 1995) to construct contigs of walking sequences 
including the initial 5* tag using minimum overlaps of 32 
nucleotides. Preferably, primer walking is performed un- 
til the sequences of full length cDNAs are obtained. 

3. Cloning of Full Length Extended cDNAs 

[01 95] The PCR product containing the full coding se- 
quence is then cloned in an appropriate vector. For ex- 
ample, the extended cDNAs can be cloned into any ex- 
pression vector known in the art. 
[01 96] Since the PCR products obtained as described 
above are blunt ended molecules that can be cloned in 
either direction, the orientation of several clones for 
each PCR product is determined. Then, 4 to 10 clones 
are ordered in microtiter plates and subjected to a PCR 
reaction using a first primer located in the vector close 
to the cloning site and a second primer located in the 
portion of the extended cDNA corresponding to the 3' 
end of the mRNA. This second primer may be the anti- 
sense primer used in anchored PCR in the case of direct 
cloning (case a) or the antisense primer located inside 
the 3'UTR in the case of indirect cloning (case b). Clones 
in which the start codon of the extended cDNA is oper- 
ably linked to the promoter in the vector so as to permit 
expression of the protein encoded by the extended cD- 
NA are conserved and sequenced. In addition to the 
ends of cDNA inserts, approximately 50 bp of vector 
DNA on each side of the cDNA insert are also se- 
quenced. 

[0197] Cloned PCR products are then entirely se- 
quenced in order to obtain at least two sequences per 
clone. Preferably, the sequences are obtained from both 
sense and antisense strands according to the afore- 
mentioned procedure with the following modifications. 



First, both 5' and 3' ends of cloned PCR products are 
sequenced in order to confirm the identity of the clone. 
Second, primer walking is performed if the full coding 
coding region has not been obtained yet. Contigatbn is 
s then performed using primer walking sequences for 
cloned products as well as walking sequences that have 
already contigated for uncloned PCR producls.*The se- 
quence is considered complete when the resulting con- 
tigs include the whole coding region as well as overlap- 
io ping sequences with vector DNA on both ends. All the 
contigated sequences for each cloned amplicon are 
then used to obtain a consensus sequence. 

4 Selection of cloned full length sequences obtained 
is from the 5' ESTs of the present invention 



[0198] A negative selection may be performed in or- 
der to eliminate unwanted cloned sequences resulting 
from either contaminants or PCR artifacts as follows. 
20 Sequences matching contaminant sequences such as 
vector DNA, tRNA, mtRNA, rRNA sequences are dis- 
carded as well as those encoding ORF sequences ex- 
hibiting extensive homology to repeats. Sequences ob- 
tained by direct cloning using nested primers on 5' and 
25 3' tags (section 1 . case a) but lacking polyA tail may be 
discarded. Only ORFs containing a signal peptide and 
ending either before the polyA tail (case a) or before the 
end of the cloned 3'UTR (case b) may be selected. 
Then, ORFs containing unlikely mature proteins such 
30 as mature proteins which size is less than 20 amino ac- 
ids or less than 25% of the immature protein size may 
be eliminated. 

[01 99] Then, for each remaining full length extended 
cDNA containing several ORFs, a preselection of ORFs 
35 may be performed using the following criteria. The long- 
est ORF with a signal peptide is preferred. If the ORF 
sizes are similar, the chosen ORF is the one which sig- 
nal peptide has the highest score according to Von He- 
ijne method 

40 [0200] Sequences of full length extended cDNA 
clones may then be compared pairwise with BLAST af- 
ter masking of the repeat sequences. Sequences con- 
taining at least 90% homology over 30 nucleotides may 
be clustered in the same class. Each cluster may then 
4S be subjected to a cluster analysis that detects sequenc- 
es resulting from internal priming or from alternative 
splicing, identical sequences or sequences with several 
frameshrfts. This automatic analysis serves as a basis 
for manual selection of the sequences. 
so [0201] Manual selection can be carried out using au- 
tomatically generated reports for each sequenced full 
length extended cDNA clone. During this manual proce- 
dure, a selection is operated between clones belonging 
to thB same class as follows. 
55 [0202] Selection of full length extended cDNA clones 
encoding sequences of interest is performed using the 
following criteria. Structural parameters (initial tag, poly- 
adenylation site and signal) may be checked. Then, ho- 
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mologies with known nucleic acids and proteins may be 
examined in order to determine whether the clone se- 
quence match a known nucleic acid/protein sequence 
and, in the latter case, its covering rate and the date at 
which the sequence became public. Sequences result- 
ing from chimera or double inserts or located on chro- 
mosome breaking points as assessed by homology to 
other sequences may be discarded during th.s proce- 
dure as well. 

[0203] Extended cDNAs prepared as described 
above may be subsequently engineered to obtain nu- 
cleic acids which include desired portions of the extend- 
ed cDNA using conventional techniques such as sub- 
cloning PCR, or in vitro oligonucleotide synthesis. For 
example, if the extended cDNA is derived from a gene 
encoding a secreted polypeptide, it may include the full 
coding sequences (i.e. the sequences encoding the sig- 
nal peptide and the mature protein remaining after the 
signal peptide is cleaved off), the sequences encoding 
the mature polypeptide (i.e. the polypeptide generated 
after the signal peptide is cleaved off), or only the cod.ng 
sequences for the signal peptides. 
[0204] Similarly, nucleic acids containing any other 
desired portion of the coding sequences for the encoded 
protein may be obtained. For example, the nucleic acid 
may contain at least 10, 12, 15. 18, 20. 23. 25. 28, 30, 
35, 40, 50, 75, 100. 200, 300, 500. or 1000 consecutive 
bases of an extended cDNA. 

[0205] Once an extended cDNA has been obtained, 
it can be sequenced to determine the amino acid se- 
quence it encodes. Once the encoded amino acid se- 
quence has been determined, one can create and iden- 
tify any of the many conceivable cDNAs that will encode 
that protein by simply using the degeneracy of the ge- 
netic code. For example, allelic variants or other homol- 
ogous nucleic acids can be identified as described be- 
low Alternatively, nucleic acids encoding the desired 
amino acid sequence can be synthesized in vitro. 
[0206] In a preferred embodiment, the coding se- 
quence may be selected using the known codon or co- 
don pair preferences for the host organism in which the 
cDNA is to be expressed. 

[0207] In addition to PCR based methods for obtain- 
ing cDN As which include the authentic 5'end of the cor- 
responding mRNA as well as the full protein coding se- 
quence of the corresponding mRNA, traditional hybrid- 
ization based methods may also be employed. These 
methods may also be used to obtain the genomic DMAs 
which encode the mRNAs from which the 5' ESTs or 
contigated consensus 5' ESTs were derived, mRNAs 
corresponding to the extended cDN As, or nucleic acids 
which are homologous to extended cDNAs. 5' ESTs. or 
contigated consensus 5' ESTs. Example 18 below pro- 
vides examples of such methods. 
[02081 Each identified ORF may be scanned for the 
presence of a signal peptide in the first 50 amino-ac.ds 
or where appropriate, within shorter regions down to 20 
amino acids or less in the ORF, using the matrix method 



of von Heijne {Nuc Acids Res. 14: 4683-4690 (1986)) 
and the modification described in Example 12. 

d) Homology to either nucleotide or protein sequences 

5 

[0209] Sequences of full-length extended cDNAs are 
then compared to known nucleotide sequences. 
Polypeptides encoded by full-length extended cDNAs 
are then compared to known polypeptide sequences. 
w [0210] Sequences of full-length extended cDNAs are 
compared to known nucleic acid sequences such as the 
vertebrate and EST sequences of Genbank, EMBL da- 
tabases and Genseq (Derwent's database of patented 
nucleotide sequences). Full-length cDNA sequences 
is are also compared to the sequences of a private data- 
base (Genset internal sequences) in order to find se- 
quences that have already been identified by applicants. 
Sequences of full-length extended cDNAs with more 
than 90% homology over 30 nucleotides using either 
20 BLASTN or BLAST2N are identified as sequences that 
have already been described. Matching vertebrate se- 
quences are subsequently examined using FASTA; full- 
length extended cDNAs with more than 70% homology 
over 30 nucleotides are identified as sequences that 
2S have already been described. 

[0211] ORFs encoded by full-length extended cDNAs 
as defined in section c) are subsequently compared to 
known amino acid sequences found in public databases 
such as Swissprot, PIR and Genptept (Derwent's data- 
30 base of patented protein sequences). These analyses 
were performed using BLASTP with the parameter W=8 
and allowing a maximum of 10 matches. Sequences of 
full-length extended cDNAs showing extensive homol- 
ogy to known protein sequences are recognized as al- 
35 ready identified proteins. 

[0212] In addition, the three-frame conceptual trans- 
lation products of the top strand of full-length extended 
cDNAs are compared to publicly known amino acid se- 
quences of Swissprot using BLASTX with the parameter 
40 E=0.001. Sequences of full-length extended cDNAs 
with more than 70% homology over 30 amino acid 
stretches are detected as already identified proteins. 

Selection of cloned full-length s equences obtained 
45 from the 5' ESTs of the pre sent invention 

[021 3] Cloned lull-length extended cDNA sequences 
that have already been characterized by the aforemen- 
tioned computer analysis are then submitted to an au- 
50 tomatic procedure in order to preselect full-length ex- 
tended cDNAs containing sequences of interest. 



a) Automatic sequence preselection 

55 [0214] All complete cloned full-length extended cD- 
NAs clipped for vector on both ends are considered. 
First, a negative selection is operated in order to elimi- 
nate unwanted cloned sequences resulting from either 
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contaminants or PCR artifacts as follows. Sequences 
matching contaminant sequences such as vector DNA; 
tRNA, mtRNA, rRNA sequences are discarded as well 
as those encoding ORF sequences exhibiting extensive 
homology to repeats as defined in section 4 a). Se- 
quences obtained by direct cloning using nested prim- 
ers on 5' and 3' tags (section 1 . case a) but lacking polyA 
tail are discarded. Onty ORFs containing a signal pep- 
tide and ending either before the potyA tail (case a) or 
before the end of the cloned 3'UTR (case b) are kept. 
Then, ORFs containing unlikely mature proteins such 
as mature proteins which size is less than 20 amino ac- 
ids or less than 25% of the immature protein size are 
eliminated. 

[0215] Then, for each remaining full-length extended 
cDNA containing several ORFs, a preselection of ORFs 
is performed using the following criteria. The longest 
ORF with a signal peptide is preferred. If the ORF sizes 
are similar, the chosen ORF is the one which signal pep- 
tide has the highest score according to Von Heijne meth- 
od 

[0216] Sequences of full-length extended cDNA 
clones are then compared pairwise with BLAST after 
masking of the repeat sequences. Sequences contain- 
ing at least 90% homology over 30 nucleotides are clus- 
tered in the same class. Each cluster is then subjected 
to a cluster analysis that detects sequences resulting 
from internal priming or from alternative splicing, identi- 
cal sequences or sequences with several frameshifts. 
This automatic analysis serves as a basis tor manual 
selection of the sequences. 

b) Manual sequence selection 



[021 7] Manual selection can be carried out using au- 
tomatically generated reports lor each sequenced full- 
length extended cDNA clone. During this manual proce- 
dure, a selection is operated between clones belonging 
to the same class as follows. ORF sequences encoded 
by clones belonging to the same class are aligned and 
compared. It the homology between nucleotide se- 
quences of clones belonging to the same class is more 
than 90% over 30 nucleotide stretches or if the homol- 
ogy between amino acid sequences of clones belonging 
to the same class is more than 80% over 20 amino acid 
stretches, than the clones are considered as being iden- 
tical. The chosen ORF is either the one exhibiting 
matches with known amino acid sequences or the best 
one according to the criteria mentioned in the automatic 
sequence preselection section. If the nucleotide and 
amino acid homologies are less than 90% and 80% re- 
spectively, the clones are said to encode distinct pro- 
teins which can be both selected if they contain se- 
quences of interest. 

[0218] Selection of full-length extended cDNA clones 
encoding sequences of interest is performed using the 
following criteria. Structural parameters (initial tag, poly- 
adenylatbn site and signal) are first checked. Then, ho- 



mologies with known nucleic acids and proteins are ex- 
amined in order to determine whether the clone se- 
quence match a known nucleotide/protein sequence 
and, in the latter case, its covering rate and the date at 

5 which the sequence became public. If there is no exten- 
sive match with sequences other than ESTs or genomic 
DNA, or if the clone sequence brings substantial new 
information, such as encoding a protein resulting from 
alternative splicing of an mRNA coding for an already 

jo known protein, the sequence is kept. Examples of such 
cloned full-length extended cDNAs containing sequenc- 
es of interest are described in Example 18. Sequences 
resulting from chimera or double inserts or located on 
chromosome breaking points as assessed by homology 

is to other sequences are discarded during this procedure. 

[0219] Extended cDNAs prepared as described 
above may be subsequently engineered to obtain nu- 
cleic acids which include desired portions of the extend- 
20 ed cDNA using conventional techniques such as sub- 
cloning, PCR, or in vitro oligonucleotide synthesis. For 
example, nucleic acids which include only the full coding 
sequences (i.e. the sequences encoding the signal pep- 
tide and the mature protein remaining after the signal 
25 peptide is cleaved off) may be obtained using tech- 
niques known to those skilled in the art. Alternatively, 
conventional techniques may be applied to obtain nu- 
cleic acids which contain only the coding sequences for 
the mature protein remaining after the signal peptide is 
30 cleaved off or nucleic acids which contain only the cod- 
ing sequences for the signal peptides. 
[0220] Similarly, nucleic acids containing any other 
desired portion of the coding sequences for the encoded 
protein may be obtained. For example, the nucleic acid 
35 may contain at least 10, 15, 18, 20, 25, 28. 30, 35. 40, 
50, 75. 1 00, 1 50, 200, 300, 400 or 500 consecutive bas- 
es of an extended cDNA. 

[0221] Once an extended cDNA has been obtained, 
it can be sequenced to determine the amino acid se- 
40 quence it encodes. Once the encoded amino acid se- 
quence has been determined, one can create and iden- 
tify any of the many conceivable cDN As that will encode 
that protein by simply using the degeneracy of the ge- 
netic code. For example, allelic variants or other homol- 
45 ogous nucleic acids can be identified as described be- 
low. Alternatively, nucleic acids encoding the desired 
amino acid sequence can be synthesized in vitro. 
[0222] In a preferred embodiment, the coding se- 
quence may be selected using the known codon or co- 
50 don pair preferences for the host organism in which the 
cDNA is to be expressed. 

[0223] In addition to PCR based methods for obtain- 
ing cDNAs which include the authentic 5'end of the cor- 
responding mRNA as well as the complete protein cod- 
55 ing sequence of the corresponding mRNA, traditional 
hybridization based methods may also be employed. 
These methods may also be used to obtain the genomic 
DNAs which encode the mRNAs from which the 5' ESTs 
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or consensus contigated 5' ESTS were derived mRN As 
corresponding to the extended cC )NAs^ or nucleic ac.d 
which are homologous to extended cDNAs, 5 ESTs. or 
consensus contigated 5' ESTs. Example 18 betow pro- 
vides examples of such methods. 

EXAMPLE 18 

Mothnrte fnr Qbtainlni -™»« ^ '"^ 

Entire Coding *nd the Authentic 5 End of the 

rnrr^ pondina m °" * - Nucleic Acd^Homoloqous .to 
Funded cDN Ac ? FSTs or Co™*"*"* Cont.qated 5 
ESTs 



r02241 A full-length cDNA library can be made using 
he strategies described in Examples 1-4 above by re- 
placing the random nonamer used in Example 2 with an 
oliqo^dT primer. Alternatively, a cDN A library or genomic 
DNA library may be obtained from a commercial source 
or made using techniques familiar to those skilled in the 
art 

[0225] Such cDNA or genomic DNA libraries may be 
used to isolate extended cDNAs obtained from 5' ESTs 
or consensus contigated 5' ESTs or nucleic adds ho- 
mologous to extended cDNAs, 5' ESTs, or consensus 
contigated 5' ESTs as follows. The cDNA library or ge- 
nomic DNA library is hybridized to a detectable probe. 
The detectable probe may comprise at 10 ^ 15 ' ™> 
20 25 28 30, 35, 40, 50, 75, 100, 150. 200, 300, 400 
or 500 consecutive nucleotides of the 5' EST, consensus 
contigated 5" EST. or extended cDNA. 
[0226] Techniques for identifying cDNA clones .n a 
cDNA library which hybridize to a given probe sequence 
are disclosed in Sambrook et a!., Molecular Cloning: A 
Laboratory Manual 2d Ed., Cold Spring Harbor Labora- 
tory Press. 1 989. The same techniques may be used to 
isolate genomic DNAs. 

r0227] Briefly. cDNA or genomic DNA clones which 
hybridize to the detectable probe are identified and iso- 
lated for further manipulation as follows, detectable 
probe described in the preceding paragraph is labeled 
with a detectable label such as a radioisotope or a fluo- 
rescent molecule. Techniques for labeling the probe are 
well known and include phosphorylation with polynucle- 
otide kinase, nick translation, in wf/o transcription, and 
non radioactive techniques. The cDNAs or genomic 
DNAs in the library are transferred to a n.trocellulose or 
nylon filter and denatured. After blocking of non specific 
sites the filter is incubated with the labeled probe for an 
amount of time sufficient to allow binding of the probe 
to cDN As or genomic DNAs containing a sequence ca- 
pable of hybridizing thereto, t.u^^n 
[0228] By varying the stringency of the hybndizat on 
conditions used to identify cDNAs or genomic DNAs 
which hybridize to the detectable probe, cDNAs or ge- 
nomic DNAs having different levels of homology to the 
probe can be identified and isolated as described below. 



1 Identified of cDNA or r^nnmir. DNA Sequences 
HT , ; » n o Mirjh nng ree of Ho ™'^ to the Labeled 
Probe 

5 [0229] To identily cDNAs or genomic DNAs having a 
high degree of homology to the probe sequence, the 
melting temperature of the probe may be calculated us- 
ing the following formulas: 

[0230] For probes between 1 4 and 70 nucleotides in 
10 length the melting temperature (Tm) is calculated using 
the formula: Tm=8l.5 + l6.6(log [Na + ])+0.41 (fraction 
G+C)-(600/N) where N is the length of the probe. 
[0231] If the hybridization is carried out in a solution 
containing lormamide, the melting temperature may be 
15 calculated using the equation Tm=81 .5+1 6.6(log [Na + ]) 
+0.41 (fraction G+C)-(0.63% formamide)-(600/N) where 
N is the length of the probe. 

[02321 Prehybridization may be carried out in 6X SSC, 
5X Denhardt's reagent, 0.5% SDS, 100 ug denatured 
20 fragmented salmon sperm DNA or 6X SSC, 5X Den- 
hardt's reagent. 0.5% SDS. 100 ug denatured fragment- 
ed salmon sperm DNA. 50% formamide. The formulas 
for SSC and Denhardt's solutions are listed in Sambrook 
eta/., supra. 

25 [0233] Hybridization is conducted by adding the de- 
tectable probe to the prehybridization solutions listed 
above Where the probe comprises double stranded 
DNA it is denatured before addition to the hybridization 
solution The filter is contacted with the hybridization so- 
30 lution for a sufficient period of time to allow the probe to 
hybridize to extended cDN As or genomic DNAs contain- 
ing sequences complementary thereto or homologous 
thereto For probes over 200 nucleotides in length, the 
hybridization may be carried out at 15-25'C below the 
35 Tm For shorter probes, such as oligonucleotide probes, 
the hybridization may be conducted at 15-25°C below 
the Tm. Preferably, for hybridizalions in 6X SSC, the hy- 
bridization is conducted at approximately 68°C. Prefer- 
ably for hybridizations in 50% formamide containing so- 
40 lutions. the hybridization is conducted at approximately 

[0234] All of the foregoing hybridizations would be 
considered to be under "stringent* conditions. 
[0235] Following hybridization, the filter is washed in 
45 2X SSC 0 1% SDS at room temperature for 15 minutes. 
The filter is then washed with 0.1 X SSC. 0.5% SDS at 
room temperature for 30 minutes to 1 hour. Thereafter, 
the solution is washed at the hybridization temperature 
in 0.1X SSC, 0.5% SDS. A final wash is conducted in 
so 0 1 X SSC at room temperature. 

[0236] cDNAs or genomic DNAs which have hybrid- 
ized to the probe are identified by autoradiography or 
other conventional techniques. 



55 Q Obtaining cDN Anr Genomic D"* sciences Having 
TrvAmr nag reos of Homo logy to the Labeled Probe 

[0237] The above procedure may be modified to iden- 
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tify cDNAs or genomic ON As having decreasing levels 
of homology to the probe sequence. For example, to ob- 
tain cDNAs or genomic DNAs of decreasing homology 
to the detectable probe, less stringent conditions may 
be used. For example, the hybridization temperature s 
may be decreased in increments of 5°C from 68°C to 
42°C in a hybridization buffer having a sodium concen- 
tration of approximately 1 M. Following hybridization, the 
filter may be washed with 2X SSC, 0.5% SDS at the tem- 
perature of hybridization. These conditions are consid- 10 
ered to be "moderate" conditions above 50°C and Mow" 
conditions below 50°C. 

[0238] Alternatively, the hybridization may be carried 
out in buffers, such as 6X SSC, containing formamide 
at a temperature of 42°C. In this case, the concentration '5 
of formamide in the hybridization buffer may be reduced 
in 5% increments from 50% to 0% to identify clones hav- 
ing decreasing levels of homology to the probe. Follow- 
ing hybridization, the filter may be washed with 6X SSC, 
0.5% SDS at 50°C. These conditions are considered to 20 
be "moderate" conditions above 25% formamide and 
"low" conditions below 25% formamide. 
[0239] cDNAs or genomic DNAs which have hybrid- 
ized to the probe are identified by autoradiography. 

25 

3. Determination of the Degree of Homology between 
the Obtained cDNAs or Genomic DNAs and 5'ESTs, 
Consensus Contioated 5'ESTs. or Exten ded cDNAs or 
Between the Polypeptides Encoded bv the Obtained 
cDNAs or Genomic DNAs and the Polypeptides 3t 
Encoded by the 5'ESTs, Consensus Contioate d 5'ESTs, 
or Extended cDNAs 

[0240] To determine the level of homology between 
the hybridized cDNA or genomic DNA and the 5'EST, * 
consensus contigated 5'EST or extended cDNA from 
which the probe was derived, the nucleotide sequences 
of the hybridized nucleic acid and the 5'EST. consensus 
contigated 5'EST or extended cDNA from which the 
probe was derived are compared. The sequences of the * 
5'EST, consensus contigated 5'EST or extended cDNA 
from which the probe was derived and the sequences 
of the cDNA or genomic DNA which hybridized to the 
detectable probe may be stored on a computer readable 
medium as described below and compared to one an- < 
other using any of a variety of algorithms familiar to 
those skilled in the art, those described below. 
[0241] To determine the level of homology between 
the polypeptide encoded by the hybridizing cDNAor ge- 
nomic DNA and the polypeptide encoded by the 5'EST, s 
consensus contigated 5'EST or extended cDNA from 
which the probe was derived, the polypeptide sequence 
encoded by the hybridized nucleic acid and the polypep- 
tide sequence encoded by the 5'EST, consensus conti- 
gated 5'EST or extended cDNA from which the probe i 
was derived are compared. The sequences of the 
polypeptide encoded by the 5'EST, consensus contigat- 
ed 5'EST or extended cDNA from which the probe was 



derived and the polypeptide sequence encoded by the 
cDNA or genomic DNA which hybridized to the detect- 
able probe may be stored on a computer readable me- 
dium as described below and compared to one another 
using any of a variety of algorithms familiar to those 
skilled in the art, those described below. 
[0242] Protein and/or nucleic acid sequence homolo- 
gies may be evaluated using any of the variety of se- 
quence comparison algorithms and programs known in 
the art. Such algorithms and programs include, but are 
by no means limited to. TBLASTN, BLASTP, FASTA, 
TFASTA, and CLUSTALW (Pearson and Lipman, 1988, 
Proc. Nail Acad. Sci. USA 85^:2444-2448; Altschul et 
al., 1990, J. Mol. Biol. 215(3^.403^10; Thompson era/., 
1994, Nucleic Acids Res. 22(2/4673-4680; Higgins et 
al, 1996, Methods Enzymol. 266383-402; Altschul et 
at., 1990, J. Mot. Biol. 215(3}.40Z-4\0, Altschul et at., 
1993, Nature Genetics 3:266-272). 
[0243] In a particularly preferred embodiment, protein 
and nucleic acid sequence homologies are evaluated 
using the Basic Local Alignment Search Tool ("BLAST") 
which is well known in the art (see, e.g., Karlin and Alt- 
schul, 1990, Proc. Natl. Acad. Sci USA 87.2267-2268; 
Altschul etal., 1990, J. Mol Biol. 275:403-410; Altschul 
era/., 1993, Nature Genetics 3:266-272; Altschul etal., 
1997, Nuc Acids Res. 25:3389-3402). In particular, five 
specific BLAST programs are used to perform the fol- 
lowing task: 

(1) BLASTP and BLAST3 compare an amino acid 
query sequence against a protein sequence data- 
base; 

(2) BLASTN compares a nucleotide query se- 
quence against a nucleotide sequence database; 

(3) BLASTX compares the six-frame conceptual 
translation products of a query nucleotide sequence 
(both strands) against a protein sequence data- 
base; 

(4) TBLASTN compares a query protein sequence 
against a nucleotide sequence database translated 
in all six reading frames (both strands); and 

(5) TBLASTX compares the six-frame translations 
of a nucleotide query sequence against the six- 
frame translations of a nucleotide sequence data- 
base. 

[0244] The BLAST programs identify homologous se- 
quences by identifying similar segments, which are re- 
ferred to herein as "high-scoring segment pairs," be- 
tween a query amino or nucleic acid sequence and a 
test sequence which is preferably obtained from a pro- 
tein or nucleic acid sequence database. High-scoring 
segment pairs are preferably identified (i.e., aligned) by 
means of a scoring matrix, many of which are known in 
the art. Preferably, the scoring matrix used is the 
BLOSUM62 matrix {Gonnet et al, 1992, Science 256. 
1443-1445; Henikoff and Henikoff, 1993, Proteins 17. 
49-61). Less preferably, the PAM or PAM250 matrices 
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may also be used (see, e.g., Schwartz and Dayhofl. 
eds 1978, Matrices for Detecting Distance Relation- 
ships: Atlas of Protein Sequence and Structure, Wash- 
ington: National Biomedical Research Foundation) 
[0245] The BLAST programs evaluate the statistical 
significance of all high-scoring segment pairs identified, 
and preferably selects those segments which satisfy a 
user-specified threshold of significance, such as a user- 
specified percent homology. Preferably, the statistical 
significance of a high-scoring segment pair is evaluated 
using the statistical significance formula of Karlm (see, 
e.g., Karlin and Altschul, 1990, Proc. Natl Acad. Sci. 
USA 87:2267-2268). 

[0246] The parameters used with the above algo- 
rithms may be adapted depending on the sequence 
length and degree of homology studied. In some em- 
bodiments, the parameters may be the default parame- 
ters used by the algorithms in the absence of instruc- 
tions from the user. 

[0247] In some embodiments, the level of homology 
between the hybridized nucleic acid and the extended 
cDNA, 5'EST, or 5' consensus contigated EST from 
whichthe probe was derived may be determined using 
the FASTDB algorithm described in Brutlag et al. Comp. 
App Biosci. 6:237-245, 1 990. In such analyses the pa- 
rameters may be selected as follows: Matrix=Un.tary, k- 
tuple=4, Mismatch Penalty=1 . Joining Penalty=30, Ran- 
domization Group Length=0, Cutoff Scored . Gap Pen- 
alty^ Gap Size Penalty=0.05, Window Size=500 or the 
length of the sequence which hybridizes to the probe, 
whichever is shorter. Because the FASTDB program 
does not consider 5' or 3' truncations when calculating 
homology levels, if the sequence which hybridizes to the 
probe is truncated relative to the sequence of the ex- 
tended cDNA, 5'EST, or consensus contigated 5'EST 
from which the probe was derived the homology level is 
manually adjusted by calculating the number of nucle- 
otides of the extended cDNA. 5'EST, or consensus con- 
tigated 5' EST which are not matched or aligned with 
the hybridizing sequence, determining the percentage 
of total nucleotides of the hybridizing sequence which 
the non-matched or non-aligned nucleotides represent, 
and subtracting this percentage from the homology lev- 
el For example, if the hybridizing sequence is 700 nu- 
cleotides in length and the extended cDNA, 5'EST, or 
consensus contigated 5' EST sequence is 1000 nucle- 
otides in length wherein the first 300 bases at the 5' end 
of the extended cDNA, 5'EST, or consensus contigated 
5' EST are absent from the hybridizing sequence, and 
wherein the overlapping 700 nucleotides are identical, 
the homology level would be adjusted as follows. The 
non-matched, non-aligned 300 bases represent 30% of 
the length of the extended cDNA, 5'EST, or consensus 
contigated 5' EST If the overlapping 700 nucleotides are 
100% identical, the adjusted homology level would be 
1 00-30=70% homology. It should be noted thai the pre- 
ceding adjustments are onty made when the non- 
matched or non-aligned nucleotides are at the 5' or 3 



ends. No adjustments are made if the non-matched or 
non-aligned sequences are internal or under any other 
conditions. 

[0248] For example, using the above methods, nucle- 
5 ic acids having at least 95% nucleic acid homology, at 
least 96% nucleic acid homology, at least 97% nucleic 
acid homology, at least 98% nucleic acid homology, at 
least 99% nucleic acid homology, or more than 99% nu- 
cleic acid homology to the extended cDNA, 5'EST, or 
10 consensus contigated 5' EST from which the probe was 
derived may be obtained and identified. Such nucleic 
acids may be allelic variants or related nucleic acids 
from other species. Similarly, by using progressively 
less stringent hybridization conditions one can obtain 
is and identify nucleic acids having at least 90%, at least 
85%. at least 80% or at least 75% homology to the ex- 
tended cDNA, 5'EST. or consensus contigated 5' EST 
from which the probe was derived. 
[0249] Using the above methods and algorithms such 
so as FASTA with parameters depending on the sequence 
length and degree of homology studied, for example the 
default parameters used by the algorithms in the ab- 
sence of instructions from the user, one can obtain nu- 
cleic acids encoding proteins having at least 99%. at 
25 least 98%, at least 97%, at least 96%. at least 95%, at 
least 90%, at least 85%, at least 80% or at least 75% 
homology to the protein encoded by the extended cD- 
NA, 5'EST, or consensus contigated 5' EST from which 
the" probe was derived. In some embodiments, the ho- 
30 mology levels can be determined using the -default' 
opening penalty and the "default" gap penalty, and a 
scoring matrix such as PAM 250 (a standard scoring ma- 
trix; see Dayhoff et al.. in: Atlas of Protein Sequence and 
Structure, Vol. 5, Supp. 3 (1978)). 
35 [0250] Alternatively, the level of polypeptide homolo- 
gy may be determined using the FASTDB algorithm de- 
scribed by Brutlag et al. Comp. App. Biosci. 6:237-245, 
1 990. In such analyses the parameters may be selected 
as follows: Matrix=PAM 0, k-tuple=2, Mismatch Penal- 
40 ty=l. Joining Penalty=20, Randomization Group 
Length=0, Cutoff Score=1, Window Size=Sequence 
Length, Gap Penatty=5, Gap Size Penalty=0.05, Win- 
dow Size=500 or the length of the homologous se- 
quence, whichever is shorter. If the homologous amino 
45 acid sequence is shorter than the amino acid sequence 
encoded by the extended cDNA. 5'EST, or consensus 
contigated 5' EST as a result of an N terminal and/or C 
terminal deletion the results may be manually corrected 
as follows. First, the number of amino acid residues of 
so the amino acid sequence encoded by the extended cD- 
NA, 5'EST, or consensus contigated 5' EST which are 
not' matched or aligned with the homologous sequence 
is determined. Then, the percentage of the length ol the 
sequence encoded by the extended cDNA, 5'EST, or 
55 consensus contigated 5' EST which the non-matched or 
non-aligned amino acids represent is calculated. This 
percentage is subtracted from the homology level. For 
example wherein the amino acid sequence encoded by 
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. hina 5'EST, or consensus contigated 5* 
homotogous sequence is 80 ^ extended cD - 

qu ence ^XsT^Z^ok o, Ih. 
^^^^ 

cdna, ^ . -^ssis, 

using 5' ESTs or consensus contigated b 
lined in the lollowing paragraphs^ 
ro2S21 Extended cDNAs may be prepared Dy o 
[0252J txit* or organism of interest 

^17B.366B1. If it is desired 

located upstream of the ^ran cDNA 
second primer » extended I to genera ^ 

S ^ b-h' ends o« the cONA to be «*, 
S Extended cDNAs containing SMragmen.s o. 
[vwi " . „ ran arPd bv hybrid zing an mRNA 
the mRNA may be P re P ar ^ Q 7 DNO s- 24-4100 and 

,o a fragment of an ES. t(an scribing the 

the pnmer to the mRNAs, ana ^ 
hybridized primer to make a first cDNA _«« 

24-4100 and B178-356B1. 



^ «ft«r ^ *pcond cDNA strand compte- 
Srv cON^ld is synthes.ed. The 

Te c r ^strand may be made * 
primer complementary to sequences -n the f rst cDRA 
s sua^dtothefirstcONAstrandandextendrngthepramr 

in nenerate the second cDNA strand. 
0256] te double stranded extended c ON As made 
using the methods descrtoed above are «Wd«< 
coned. The extended cDNAs may be Coned rto «- 
, 0 tore such as plasmids or viral vectors capable of refb- 
^t a , a ppropria.ehc,.cellForexamp.e,»her^ 
Sybeabacte,ia..mammalian.avian.or,nsec.c^. 
folsT Techniques for isolating mRNA, reverse tran- 
Sga primer hybri** to mRNA to generate M 
,s cDNrLand.extendingaprimertomakeasecondc^ 

NAstrand comp.emen.ary to the firs. cONA «WnA£ 

Alternately, other procedures may be used 
or obLning ful.-.eng* cDNAs or 
2S one approach. full-length or f 1 ^ 

pared irom mRNA and cloned into double stranded 
nLemids as follows. The cDNA library « 

randed phagemids * then 
by treatment with an endonuclease, such as me _Gene 
30 u oroduct of the phage Fl and an exonuclease (Chang 
K 27 95-8 1993). A biotinylated oligonucte- 
otide comprising the sequence of a fragment of an EST- 

ohaqemids Preferably, the fragment comprrses at least 
OS 10 12 15 17 , 8. 20, 23, 25, or 28 consecutive nude- 
o«des o. the sequences o, SEQ ID NOs. 24-4100 and 

hybrids between the biotinylated oligonucle- 
S" ind Phagemids are isolated by incubating the hy- 

retrieving the beads with a magnet (Fry at ai, Boteen 
TZ u: 124-131. 1992). Thereafter, the resuH^g 
ohaqemids are released from the beads and converted 
WoCblestrandedDNAusingaprimer specific forme 

45 * EST or consensus contigated 5'EST sequence used 
45 odes^bttinylatedoli^ 

be used. The resulting double stranded 0W«w» 

^rSng any of the above described methcrfs in 
sectton 111 a plurality of extended cDNAs containing full- 

oTuse in diagnose assays as described below. 
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EXAMPLE 19 

Flf l1 I onqth CPNAS 



[0261] ^P^f-^;r^ 

,8 were used to ° b * n Jf °*tin a variety o. tis- 

SEQ ID NO.1 <™™ . cDN A en codes the 

having a von Heijne score <* 8^ . 

[0264, «» «*• 

identification number SB 35 2h g g 

von Heijne score of 10.7. encoded by the 

102651 FU rrl O 0 ;cDNAfmabe".eened«orme 
extended or lull-length cDNAs may _ or fa 

presence o. known s,ruc,ur ?'°S° add sequences 
^presence o. signatures, s«-- J e(s Q| a 

which are well ^ en ^°^ lUB polypeptides 

that were screened tor the P <*<™ software , rom 
signatures and mot,ts using the ^Proscar, ^ 
the GCG package and the Prosite 
provided below. encoded by the 

[0266] ^^f'^'^O 7 vernal designation 

nature from positions 90tomw fodenland 

spread ^ --^^ligands such as 
primate species, bind nyo H ex . 
phospholipids and ' "^^ e , lhol ^ to play a 
pressed in brain and in testis an tetion o) the 

role in cell growth ™ & ™^™^ rane remode- 
ling. They may act either tnroug a gee Sch . 

oentgen and Jollfes. R3* ^ • ^ in ot 
Taken together, these data sugg ^ration 
SEQ ID NO. 8 may ^V^^ bB related to 
and in membrane T*^™^ indiag- 
ma ,e fertility. Thus. lhes ^ ,n n 3egenera.ive dis- 

rility . „, Q poinNO-lOencodedbythe 

[02671 The protein ol SEQ IUN<J. 



u ™. cpo ID NO:9 (internal designation 
,U, '-' en fc^ HQ Fi as^ws homologies with a lami, 
tOS-Oia-S-O-HS-FLClshow 

ol tysophosphol^pases W*- ^ 

(yeast, rabbit, '^ en,s k and h ^ ieL-independeri 

phosphol pass M ac q| ^ „ 
phro., 9 .117B-11BO \ n qyG ^ tif 0 car. 

Exhibit me active site , s GXSXG^ ^ ^ 
boxylesterases that is also louno 

and von He„ne, CM « , mal me pro 

'"ISS^S ™& ™» •* - 

is tein ol SEQ ID nu. tu "«y n j mfe ro . 

[0268] ^f'^'^o^i (internal designation 
I™ 54 To FL? shows reU homology to a 

^t^S^Oeuk*. o, in the Golgi ap- 
catedin the endop js „, glyco p r ote.ns, 

30 paratus, catalyzes the oi_ y tefisHc , ea . 

turesdefmec as fto»«^ ^ prelty well ^ 

49 and a large 2^"°~££ g C ter 
predicted by the software TopPred l II (Ciaros 
P Heiine. CAB.OS *pfc P °.elo, 

Taken togemer, these data suggest lhaUh p 

« SEQ ID NO 12 o, 
polysaccharides, and of the caroony • 
apoproteins and g^s and^ ^ 

tfon. Thus, this protein ^,'^^" M i n g i but not 
o, treating severa. ^^, iS o,. 
" i e^r— -diseases, 
eluding rheumatoid arthnt,s_ ^ 
l ° m 91 , h ^DNA SEQ -D NO D ,3°in.emal designation 

bZIP family ot transcnpt.on factor*, ana e p 
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domain composed ol a basic DNA-binding domain and 
of a leucine zipper allowing protein dimerizator The ^ba- 
sic domain is conserved in the protein ol SEQ ID NO. 
1 4 as shown by the characteristic PROSITE signature 
(positions 224-237) except tor a conservative substitu- 
tion of a glutamic acid with an aspartic acid in position 
233 The typical PROSITE signature tor leucine zipper 
is also present (positions 259 to 280) Taken together^ 
these data suggest that the protein of SEQ ID NO. 14 
may bind to DNA. hence regulating gene expression as 
a transcription factor. Thus, this protein may be useful 
in diagnosing and/or treating several types of disorders 
including, but not limited to. cancer. 
ro270] Bacterial clones containing plasmids contain- 
ing the full length cDNAs described above are presently 
stored in the inventor's laboratories under the internal 
identification numbers provided above. The inserts may 
be recovered from the deposited materials by growing 
an aliquot of the appropriate bacterial clone in the ap- 
propriate medium. The ptasmid DNA can then be isolat- 
ed using plasmid isolation procedures familiar to those 
skilled in the art such as alkaline lysis minipreps or arge 
scale alkaline lysis plasmid isolation procedures. If de- 
sired the plasmid DNA may be further enriched by cen- 
trilugation on a cesium chloride gradient, size exclusion 
chromatography, or anion exchange chromatography 
The plasmid DNA obtained using these procedures may 
then be manipulated using standard cloning techniques 
familiar to those skilled in the art. Alternately a PCR 
can be done with primers designed at both ends of the 
EST insertion. The PCR product which corresponds to 
the 5'EST can then be manipulated using standard clon- 
ing techniques familiar to those skilled in the art. 

IV. Expression ot Proteins 



[02711 EST-related nucleic acids, fragments of EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids, and fragments ot positional segments 
of EST-related nucleic acids may be used to express the 
polypeptides which they encode. In particular, they may 
be used to express EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides, or fragments of Phonal 
segments of EST-related polypeptides. In some embod- 
iments the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids, and fragments of 
positional segments ot EST-related nucleic acids may 
be used to express the full polypeptide (i.e. the signal 
peptide and the mature polypeptide) of a secreted pro- 
tein, the mature protein (i.e. the polypeptide generated 
after cleavage of the signal peptide), or the signal pep- 
tide of a secreted protein. II desired, nucleic acids en- 
coding the signal peptide may be used to facilitate se- 
cretion of the expressed protein. It will be appreciated 
that a plurality of EST-related nucleic acids, fragments 
of EST-related nucleic acids, positional segments of 
EST-related nucleic acids, or fragments of positional 



segments of EST-related nucleic acids may be simulta- 
neously cloned into expression vectors to create an ex- 
pression library for analysis of the encoded proteins as 
described below. 

5 

EXAMPLE 20 

Fv pmsAinn ol the Proteins Encoded b v the Genes 
r^rroc pnnriino to the 5'ES T* or Consensus Contiqated 
jo 5' ESTs 

[0272] To express their encoded proteins the EST-re- 
lated nucleic acids, fragments of EST-related nucleic ac- 
ids, positional segments of EST-related nucleic acids. 

is or fragments of positional segments of EST-related nu- 
cleic acids are cloned into a suitable expression vector. 
In some instances, nucleic acids encoding EST-related 
polypeptides, fragments of EST-related polypeptides, 
positional segments of EST-related polypeptides or 

20 fragments of positional segments of EST-related 
polypeptides may be cloned into a suitable expression 
vector. 

[0273] In some embodiments, the nucleic acids in- 
serted into the expression vector may comprise the cod- 
25 ing sequence of a sequence selected from the group 
consisting of 24-4100. In other embodiments, the nucle- 
ic acids inserted into the expression vector may com- 
prise may comprise the full coding sequence (i.e. the 
nucleotides encoding the signal peptide and the mature 
30 polypeptide) of one of SEQ ID NOs: 3721 -381 1 . In some 
embodiments, the nucleic acid inserted into the expres- 
sion vector may comprise the nucleotides of one of the 
sequences of SEQ ID NOs: 3721-3811 which encode 
the mature polypeptide (i.e. the nucleotides encoding 
35 the polypeptide generated after cleavage of the signal 
peptide). In further embodiments, the nucleic acids in- 
serted into the expression vector may comprise the nu- 
cleotides of 24-652 and 3721-3811 which encode the 
signal peptide to facilitate secretion of the expressed 
40 protein. The nucleic acids inserted into the expression 
vectors may also contain sequences upstream of the se- 
quences encoding the signal peptide, such as sequenc- 
es which regulate expression levels or sequences which 
confer tissue specific expression. 
45 [0274] The nucleic acid inserted into the expression 
vector may encode a polypeptide comprising the one of 
the sequences of SEQ ID NOs: 41 01 -81 77. In some em- 
bodiments, the nucleic acid inserted into the expression 
vector may encode the full polypeptide sequence (i.e. 
so the signal peptide and the mature polypeptide) included 
in one of SEQ ID NOs: 7798-7888. In other embodi- 
ments, the nucleic acid inserted into the expression vec- 
tor may encode the mature polypeptide (i.e. the 
polypeptide generated after cleavage of the signal pep- 
55 tide) included in one of the sequences of SEQ ID NOs: 
798-7888. In further embodiments, the nucleic acids in- 
serted into the expression vector may encode the signal 
peptide included in one of the sequences of 4101-4729 
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and 7798-7888. 

[0275] The nucleic acid encoding the protein or 
polypeptide to be expressed is operabfy linked to a pro- 
moter in an expression vector using conventional clon- 
ing technology. The expression vector may be any of 
the mammalian, yeast, insect or bacterial expression 
systems known in the art. Commercially available vec- 
tors and expression systems are available from a variety 
of suppliers including Genetics Institute (Cambridge, 
MA), Stratagene (La Jolla, California), Promega (Madi- 
son, Wisconsin), and Invitrogen (San Diego, California). 
If desired, to enhance expression and facilitate proper 
protein folding, the codon context and codon pairing of 
the sequence may be optimized for the particular ex- 
pression organism in which the expression vector is in- 
troduced, as explained by Hatfield, at at, U.S. Patent 
No. 5,082.767. 

[0276] The following is provided as one exemplary 
method to express the proteins encoded by the nucleic 
acids described above. In some instances the nucleic 
acid encoding the protein or polypeptide to be ex- 
pressed includes a methionine initiation codon and a 
polyA signal. If the nucleic acid encoding the polypep- 
tide to be expressed lacks a methionine to serve as the 
initiation site, an initiating methionine can be introduced 
next to the first codon of the nucleic acid using conven- 
tional techniques. Similarly, if the nucleic acid encoding 
the protein or polypeptide to be expressed lacks a polyA 
signal, this sequence can be added to the construct by, 
for example, splicing out the polyA signal from pSG5 
(Stratagene) using Bgll and Sail restriction endonucle- 
ase enzymes and incorporating it into the mammalian 
expression vector pXTl (Stratagene). pXTI contains 
the LTRs and a portion of the gag gene from Moloney 
Murine Leukemia Virus. The position of the LTRs in the 
construct allow efficient stable transfection. The vector 
includes the Herpes Simplex thymidine kinase promoter 
and the selectable neomycin gene. The nucleic acid en- 
coding the polypeptide to be expressed is obtained by 
PCR from the bacterial vector using oligonucleotide 
primers complementary to the nucleic acid encoding the 
protein or polypeptide to be expressed and containing 
restriction endonuclease sequences for Pst I incorpo- 
rated into the 5'primer and Bglll at the 5* end of 3' primer, 
taking care to ensure that the nucleic acid encoding the 
protein or polypeptide to be expressed is correctly po- 
sitioned with respect to the poly A signal. The purified 
fragment obtained from the resulting PCR reaction is di- 
gested with Pstl, blunt ended with an exonuclease, di- 
gested with Bgl II, purified and ligated to pXT1 , now con- 
taining a poly A signal and digested with Bglll. 
[0277] The ligated product is transfected into mouse 
NIH 3T3 cells using Lipofectin (Life Technologies, Inc., 
Grand Island, New York) under conditions outlined in the 
product specification. Positive transfectants are select- 
ed after growing the transfected cells in 600 ng/ml G41 8 
(Sigma, St. Louis, Missouri). 

[0278] Altemalivety, the nucleic acid encoding the 



protein or polypeptide to be expressed may be cloned 
into pED6dpc2 as described above. The resulting 
pED6dpc2 constructs may be transfected into a suitable 
host cell, such as COS 1 cells. Methotrexate resistant 
s cells are selected and expanded. The expressed protein 
or polypeptide may be isolated, purified, orenriched as 
described above. 

[0279] To confirm expression of the desired protein or 
polypeptide, the proteins or polypeptides produced by 

io cells containing a vector with a nucleic acid insert en- 
coding the protein or polypeptide are compared to those 
lacking such an insert. The expressed proteins are de- 
tected using techniques familiar to those skilled in the 
art such as Coomassie blue or silver staining or using 

'5 antibodies against the protein or polypeptide encoded 
by the nucleic acid insert. Antibodies capable of specif- 
ically recognizing the protein of interest may be gener- 
ated using synthetic 15-mer peptides having a se- 
quence encoded by the appropriate nucleic acid. The 

20 synthetic peptides are injected into mice to generate an- 
tibody to the polypeptide encoded by the nucleic acid. 
[0280] If the proteins or polypeptides encoded by the 
nucleic acid inserts are secreted, medium prepared 
from the host cells or organisms containing an expres- 

25 sion vector which contains a nucleic acid insert encod- 
ing the desired protein or polypeptide is comparedjto 
mdieum prepared from the control cells or organism. 
The presence of a band in medium from the cells con- 
taining the nucleic acid insert which is absent from prep- 

30 arations from the control cells indicates that the protein 
or polypeptide encoded by the nucleic acid insert is be- 
ing expressed and secreted. Generally, the band corre- 
sponding to the protein encoded by the nucleic acid in- 
sert will have a mobility near that expected based on the 

35 number of amino acids in the open reading frame of the 
nucleic acid insert. However, the band may have a mo- 
bility different than that expected as a result of modifi- 
cations such as glycosylation, ubiquitination, or enzy- 
matic cleavage. 

40 [0281] Alternatively, if the protein expressed from the 
above expression vectors does not contain sequences 
directing its secretion, the proteins expressed from host 
cells containing an expression vector with an insert en- 
coding a secreted protein or portion thereof can be com- 

45 pared to the proteins expressed in control host cells con- 
taining the expression vector without an insert. The 
presence of a band in samples from cells containing the 
expression vector with an insert which is absent in sam- 
ples from cells containing the expression vector without 

50 an insert indicates that the desired protein or portion 
thereof is being expressed. Generally, the band will 
have the mobility expected for the secreted protein or 
portion thereof. However, the band may have a mobility 
different than that expected as a result of modifications 

55 such as glycosylation, ubiquitination, or enzymatic 
cleavage. 

[0282] The expressed protein or polypeptide may be 
purified, isolated or enriched using a variety of methods. 
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In some methods, the protein or polypeptide may be se- 
creted into the culture medium via a native signal pep- 
tide or a heterologous signal peptide operabty linked 
thereto. In some methods, the protein or polypeptide 
may be linked to a heterologous polypeptide which fa- 
cilitates its isolation, purification, or enrichment such as 
a nickel binding polypeptide. The protein or polypeptide 
may also be obtained by gel electrophoresis, ion ex- 
change chromatography, size chromatography, hplc, 
salt precipitation, immunoprecipitation, a combination of 
any of the preceding methods, or any of the isolation, 
purification, or enrichment techniques familiar to those 
skilled in the art. 

[02B3] The protein encoded by the nucleic acid insert 
may also be purified using standard immunochromatog- 
raphy techniques using immunoaffinity chromatography 
with antibodies directed against the encoded protein or 
polypeptide as described in more detail below. If anti- 
body production is not possible, the nucleic acid insert 
encoding the desired protein or polypeptide may be in- 
corporated into expression vectors designed for use in 
purification schemes employing chimeric polypeptides. 
In such strategies, the coding sequence of the nucleic 
acid insert is ligated in frame with the gene encoding the 
other half of the chimera. The other half of the chimera 
may be p-globin or a nickel binding polypeptide. A chro- 
matography matrix having antibody to p-globin or nickel 
attached thereto is then used to purify the chimeric pro- 
tein. Protease cleavage sites may be engineered be- 
tween the p-globin gene or the nickel binding polypep- 
tide and the extended cDNA or portion thereof. Thus, 
the two polypeptides of the chimera may be separated 
from one another by protease digestion. 
[0284] One useful expression vector for generating p- 
globin chimerics is pSG5 (Stratagene). which encodes 
rabbit p-globin. Intron II of the rabbit p-globin gene facil- 
itates splicing of the expressed transcript, and the poly- 
adenylation signal incorporated into the construct in- 
creases the level of expression. These techniques as 
described are well known to those skilled in the art oi 
molecular biology. Standard methods are published in 
methods texts such as Davis et ai, {Basic Methods in 
Molecular Biology. L.G. Davis, M.D. Dibner, and J.F. 
Battey, ed., Elsevier Press, NY, 1986) and many of the 
methods are available from Stratagene, Life Technolo- 
gies, Inc., or Promega. Polypeptide may additionally be 
produced from the construct using in vitro translation 
systems such as the In vitro Express™ Translation Kit 
(Stratagene). 

[0285] Following expression and purification of the 
proteins or polypeptides encoded by the nucleic acid in- 
serts, the purified proteins may be tested for the ability 
to bind to the surface of various cell types as described 
in Example 21 below. It will be appreciated that a plural- 
ity ol proteins expressed from these nucleic acid inserts 
may be included in a panel of proteins to be simultane- 
ously evaluated for the activities specifically described 
below, as well as other biological roles for which assays 



for determining activity are available. 
EXAMPLE 21 

s Analysis of Secreted Proteins to Determine Whether 
they Bind to the Cell Surface 

[0286] The EST-related nucleic acids, fragments of 
EST-related nucleic acids, positional segments of EST- 
io related nucleic acids, fragments of positional segments 
of EST-related nucleic acids, nucleic acids encoding the 
EST-related polypeptides, nucleic acids encoding frag- 
ments of the EST-related polypeptides, nucleic acids 
encoding positional segments of EST-related polypep- 
15 tides, or nucleic acids encoding fragments of positional 
segments of EST-related polypeptides are cloned into 
expression vectors such as those described in Example 
20. The encoded proteins or polypeptides are purified, 
isolated, or enriched as described above. Following pu- 
20 rification, isolation, or enrichment, the proteins or 
polypeptides are labeled using techniques known to 
those skilled in the art. The labeled proteins or polypep- 
tides are incubated with cells or cell lines derived from 
a variety of organs or tissues to allow the proteins to 
2S bind to any receptor present on the cell surface. Follow- 
ing the incubation, the cells are washed to remove non- 
specifically bound proteins or polypeptides. The specif- 
ically bound labeled proteins or polypeptides are detect- 
ed by autoradiography. Alternatively, unlabeled proteins 
30 or polypeptides may be incubated with the cells and de- 
tected with antibodies having a detectable label, such 
as a fluorescent molecule, attached thereto. 
[0287] Specificity of cell surface binding may be ana- 
lyzed by conducting a competition analysis in which var- 
35 ious amounts of unlabeled protein or polypeptide are in- 
cubated along with the labeled protein or polypeptide. 
The amount of labeled protein or polypeptide bound to 
the cell surface decreases as the amount of competitive 
unlabeled protein or polypeptide increases. As a control, 
40 various amounts of an unlabeled protein or polypeptide 
unrelated to the labeled protein or polypeptide is includ- 
ed in some binding reactions. The amount of labeled 
protein or polypeptide bound to the cell surface does not 
decrease in binding reactions containing increasing 
45 amounts of unrelated unlabeled protein, indicating that 
the protein or polypeptide encoded by the nucleic acid 
binds specifically to the cell surface. 
[0288] As discussed above, human proteins have 
been shown to have a number of important physiological 
so effects and, consequently, represent a valuable thera- 
peutic resource. The human proteins or polypeptides 
made as described above may be evaluated to deter- 
mine their physiological activities as described below. 

55 
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EXAMPLE 22 

Assaying the Fxoressed Pro ^i"* ™ Polypeptides tor 
Cytokine. Cell Proliferation or Cell Di fferentiation 
Activity 

[0289] As discussed above, some human proteins act 
as cytokines or may affect cellular proliferation or differ- 
entiation. Many protein factors discovered to date, in- 
cluding all known cytokines, have exhibited activity in t< 
one or more factor dependent cell proliferation assays, 
and hence the assays serve as a convenient confirma- 
tion of cytokine activity. The activity of a protein or 
polypeptide of the present invention is evidenced by any 
one of a number of routine factor dependent cell prolil- 1 
eration assays (or cell lines including, without limitation, 
32D DA2, DA1G, T10, B9, B9/11, BaF3, MC9/G, M+ 
(preB M*)! 2E8, RB5, DA1, 123. T1165, HT2, CTLL2, 
TF-1 . Mo7c and CMK. The proteins or polypeptides pre- 
pared as described above may be evaluated for their i 
ability to regulate T cell or thymocyte proliferation in as- 
says such as those described above or in the following 
references: Current Protocols in Immunology, Ed. by J. 
E. Coligan etai, Greene Publishing Associates and Wi- 
ley-lnterscience; Takai et ai J. Immunol 137: 
3494-3500, 1986., Bertagnolli eta/. J. Immunol. 145: 
1706-1712, 1990., Bertagnolli etai, Cellular Immunol- 
ogy 133:327-341 . 1991. Bertagnolli, etai J. Immunol 
149:3778-3783, 1992; Bowman etai, J. Immunol. 152: 
1756-1761, 1994. 

[0290] In addition, numerous assays for cytokine pro- 
duction and/or the proliferation of spleen cells, lymph 
node cells and thymocytes are known. These include 
the techniques disclosed in Current Protocols In Im- 
munology. J.E. Coligan etai Eds., 1:3.12.1-3.12.14. 
John Wiley and Sons, Toronto. 1994; and Schreiber. R 
D. In Current Protocols in Immunology, supra 1 : 
6.8.1-6.8.8. 

[0291] The proteins or polypeptides prepared as de- 
scribed above may also be assayed for the ability to reg- 
ulate the proliferation and differentiation of hematopoi- 
etic or lymphopoietic cells. Many assays for such activity 
are familiar to those skilled in the art. including the as- 
says in the following references: Bottomly etai, In Cur- 
rent Protocols in Immunology., supra. 1 : 6.3.1-6.3.12,; 
deVries et al., J. Exp. Med. 173:1205-1211, 1991; 
Moreau etai., Nature 36:690-692, 1988; Greenberger 
etai, Proc. Natl. Acad. Sci. U.S.A. 80:2931-2938, 1983; 
Nordan, Ft., In Current Protocols in Immunology, supra. 
1 : 6.6.1-6.6.5; Smith etai., Proc. Nati Acad. Sci U.S. 
A 83:1857-1861, 1986; Bennett et a/ in Current Proto- 
cols in Immunology supra 1 : 6.15.1; Ciarletta et al In 
Current Protocols in Immunology supra 1 : 6.13.1 
[0292] The proteins or polypeptides prepared as de- 
scribed above may also be assayed for their ability to 
regulate T-cell responses to antigens. Many assays for 
such activity are familiar to those skilled in the art, in- 
cluding the assays described in the following referenc- 



es: Chapter 3 (In vitro Assays for Mouse Lymphocyte 
Function), Chapter 6 (Cytokines and Their Cellular Re- 
ceptors) and Chapter 7, (Immunologic Studies in Hu- 
mans) in Current Protocols in Immunology supra. : Wein- 
berger etai., Proc. Natl. Acad. Sci USA 77:6091 -6095, 
1980; Weinberger et al., Eur. J. Immun. 11:405-411, 
1981; Takai etai. J Immunol. 137:3494-3500, 1986; 
Takai et ai, J. Immunol. 1 40:508-51 2. 1 988. 
[0293] Those proteins or polypeptides which exhibit 

o cytokine, cell proliferation, or cell differentiation activity 
may then be formulated as pharmaceuticals and used 
to treat clinical conditions in which induction of cell pro- 
liferation or differentiation is beneficial. Alternatively, as 
described in more detail below, nucleic acids encoding 

5 these proteins or polypeptides or nucleic acids regulat- 
ing the expression of these proteins or polypeptides may 
be introduced into appropriate host cells to increase or 
decrease the expression of the proteins or polypeptides 
as desired. 

?o 

EXAMPLE 23 

Assaying the Expressed Proteins or Polypeptides for 
Activity as Immune System Regulators 

25 

[0294] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their effects as 
immune regulators. For example, the proteins or 
polypeptides may be evaluated for their activity to influ- 
30 ence thymocyte or splenocyte cytotoxicity. Numerous 
assays for such activity are familiar to those skilled in 
the art including the assays described in the following 
references: Chapter 3 {In vitro Assays for Mouse Lym- 
phocyte Function 3.1-3.19) and Chapter 7 (Immunologic 
35 studies in Humans) in Current Protocols in 
Immunology , J.E. Coligan etai. Eds, Greene Publishing 
Associates and Wiley-lnterscience; Herrmann ef al., 
Proc. Natl. Acad. Sci. USA 78:2488-2492, 1 981 ; Herrm- 
ann et ai, J Immunol. 128:1968-1974, 1982; Handa et 
40 ai, J. Immunol. 135:1564-1572. 1985; Takai et ai, J. 
Immunol. 137:3494-3500, 1986; Takai era/., J. Immu- 
nol. 140:508-512, 1988; Bowman et al., J. Virology 61: 
1992-1998; Bertagnolli ef al. Cell. Immunol. 133: 
327-341, 1991; Brown et ai, J. Immunol. 153: 
45 3079-3092, 1994. 

[0295] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their effects on 
T-cell dependent immunoglobulin responses and iso- 
type switching. Numerous assays for such activity are 
so familiar to those skilled in the art, including the assays 
disclosed in the following references: Maliszewski, J. 
Immunol. 144:3028-3033, 1990; Mond ef ai in Current 
Protocols in Immunology, 1 : 3.8.1-3.8.16, supra. 
[0296] The proteins or polypeptides prepared as de- 
55 scribed above may also be evaluated tor their effect on 
immune effector cells, including their effect on Th1 cells 
and cytotoxic lymphocytes. Numerous assays for such 
activity are familiar to those skilled in the art, including 
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te expression^ these pro.einsorpolypeptKJesmay 
«. be 9 nUcdu P C edin.oappropria,ehos,ce,ls.o ; ncr aseor 
decrease the expression ol the proteins as desired. 



EXAMPLE 25 

4. ^ Y '"" the Exprn^* d Proteins or PolypjjptaesjQL 
nr11 .i-«™ Tissue Growth 

103131 The proteins or polypeptides encoded by the 
nuclei acids described above may also be evaluated 
so "o, heir aflect on tissue growth. Numerous assays or 
such activity are laminar to those skilled in the art in- 

ud ng the assays disclosed in 
Publication No. W095/1 6035, Internationa 
lication No. W095A35846 and lnlernat,onal Patent Pub 
55 lication No. WO91/07491. 

r0314l Assays lor wound healing activity include 
llimitation.those described in: W* P*- 
Wound Healing, pps. 71-112 (Maibach, Ht and Rovee. 
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DT eds.). Year Book Medical Publishers, Inc.. Chicago 
as modified by Eag.stein and Mertz, J. Invest. Dermatol 
71-382-84 (1978). 

[0315] Those proteins or polypeptides which are in- 
volved in the regulation of tissue growth , ma* ■ then 
(ormulatedas pharmaceuticals and used 
conditions in which regulation of tissue growth is bene- 
.c at For example, a protein or polypeptide may have 
utility in compositions used for bone, cartriage. tendon 
gament anchor nerve tissue growth or -genera as 
well asfor wound healing and tissue repair and replace- 
Tent and in the treatment of burns, incs.onsand ulce s. 
[0316] A protein or polypeptide encoded by the nuclo- 
c acids described above which induces cart.lage and/ 
or bone growth in circumstances where bone is not nor- 

Zel and cartilage damage or defects ,n hum™ arrf 
other animals. Such a preparation employ.ng a parte m 
a polypeptide of the invention may have prophylactic 

I the improved fixation ol artificial pints. De novo bone 
^thesis induced by an osteogenic .agent 
to the repair of congenital, trauma induced or oncologic 
resection induced craniofacial defects, and also « use- 
lul in cosmetic plastic surgery. 
0317] A protein or polypeptide of this ™*«n may 
also be used in the treatment of periodontal d.sease, 
and in other tooth repair processes. Such agents may 
provide an environment to attract bone-forming crib, 
stimulate growth of bone-forming cells or .nduce drf er- 
elaTon of progenitors of bone-forming cells. A proton 
o he inventioJ Lay also be useful in the treatmen of 
osteoporosis or osteoarthritis, such as through s .mu^ 
Sonofboneanavorcartilagerepairorbyblock.ng.nflam- 

mation or processes of tissue destruction (collagenase 
S osteoclast activity, etc.) mediated by rtlamma- 

Another category of tissue regeneration activ- 
" ty that may be attributable to the proteins or polypep- 
Sde, encoded by the nuclei acids described above » 
^Z^io^on. A protein or pofypept.de , en- 
Sd by the nucleic acids described above. wh.ch in- 
duces tendon/iigament-like tissue or other t.ssue forma- 

tion in circumstances where such tissue .s not normally 
orrned has application in the healing of tendon or ga- 
ment tears, deformities and other tendon or ligamentde- 
Te t humans and other animals. Such a prepay 
employing a tendon/ligament-.ike tissue .nducing pro- 
teTn may have prophylactic use in preventmg damage 
to Zl or ligament tissue, as well as use ,n the im- 
proved fixation of tendon or ligament to bone or ctom 
Lues, and in repairing defects to tendon or l.gament 

ifsue De novo tendon*igament-like t,ssue forma .on 

nduced by a protein or porypeptide of the presentin- 
ention contributes to the repair of tendon or hgamen 

defects of congenital, traumatic or other ongm and .s 

arusefulincosmet-^ 

repair of tendons or ligaments. The prolans or polypep- 



tides of the present invention may provide an env.ron- 
ment to attract tendon- or ligament-forming cells, sim- 
ulate growth ol tendon- or ligament-forming cells, .nduce 
differentiation of progenitors of tendon- or Ugamen - 
> forming cells, or induce growth of tendon/ligament cells 
or progenitors ex vivo for return in vivo to effect tissue 
repair The proteins or polypeptides of the invention may 
also be useful in the treatment of tendinitis, carpal tunnel 
syndrome and other tendon or ligament defects. The 
,o therapeutic compositions may also include an appropri- 
ate matrix and/or sequestering agent as a carrier as is 
well known in the art. 

[03191 The proteins or polypeptides of the present in- 
vention may also be useful for proliferation of neural 
is cells and for regeneration of nerve and brain tissue, i. 
e lor the treatment of central and peripheral nervous 
system diseases and neuropathies, as well as mechan- 
ical and traumatic disorders, which involve degenera- 
tion death or trauma to neural cells or nerve t.ssue^ 
20 More specifically, a protein or polypeptide may be used 
in the treatment of diseases of the peripheral nervous 
system, such as peripheral nerve injuries, penphera 
neuropathy and localized neuropathies, and central 
nervous system diseases, such as Alzheimer's. Parkm- 
25 son's disease. Huntington's disease, amyotrophy later- 
al sclerosis, and Shy-Drager syndrome. Further condi- 
tions which may be treated in accordance w.th the 
present invention include mechanical and traumafc dis- 
orders, such as spinal cord disorders, head trauma and 
so cerebrovascular diseases such as stroke. Peripheral 
neuropathies resulting from chemotherapy or other 
medical therapies may also be treatable using a prote.n 
or polypeptide of the invention. 
[0320] Proteins or polypeptides of the invention may 
35 also be useful to promote better or taster closure of non- 
healing wounds, including without limitation pressure ul- 
cers, ulcers associated with vascular insuffic.ency, sur- 
gical and traumatic wounds, and the like. 
fu321l It is expected that a protein or polypeptide ot 
40 the present invention may also exhibit activity for gen- 
eration or regeneration of other tissues, such as organs 
(including, for example, pancreas, liver, intestine. k.d- 
ney skin, endothelium) muscle (smooth, skeletal or car- 
diac) and vascular (including vascular endothelium) lis- 
45 sue. or tor promoting the growth of cells comprising such 
tissues Part ot the desired effects may be by mh.brtion 
or modulation of fibrotic scarring to allow normal tissue 
to generate. A protein or polypeptide ol the mvention 
may also exhibit angiogenic activity. 
50 [0322] A protein or polypeptide of the present inven- 
tion may also be useful for gut protection or regeneration 
and treatment of lung or liver fibrosis, reperius.on .n|ury 
in various tissues, and conditions resulting from system- 
ic cytokine damage. 
55 [0323] A protein or polypeptide ol the present inven- 
tion may also be useful for promoting or inhibiting drff er- 
entiation of tissues described above Irom precursor tis- 
sues or cells; or for inhibiting the growth of tissues de- 
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scribed above. 

[0324] Alternatively, as described in more detail be- 
low, nucleic acids encoding tissue growth regulating ac- 
tivity proteins or polypeptides or nucleic acids regulating 
the expression of such proteins or polypeptides may be 
introduced into appropriate host cells to increase or de- 
crease the expression of the proteins as desired. 

EXAMPLE 26 

Assaying the Expressed Proteins or Polypeptides tor 
Regulation of Reproductive Hormones 

[0325] The proteins or polypeptides of the present in- 
vention may also be evaluated for their ability to regulate 
reproductive hormones, such as follicle stimulating hor- 
mone. Numerous assays for such activity are familiar to 
those skilled in the art, including the assays disclosed 
in the following references: Vale ef al, Endocrinol 91: 
562-572, 1972; Ling etal, Nature 321:779-782. 1986; 
Vale era/., Nature 321:776-779, 1986; Mason ef al., Na- 
ture 31 8:659-663. 1985; Forage etal, Proc. Natl Acad. 
Sci. USA 83:3091-3095, 1986. Chapter 6.12 in Current 
Protocols in Immunology, J.E. Coligan ef al. Eds. 
Greene Publishing Associates and Wiley-lntersciece ; 
Taub etal. J. Clin. Invest. 95:1370-1376. 1995; Lind ef 
al. APMIS 103:140-146, 1995; Muller ef al Eur. J. Im- 
munol. 25:1744-1748; Gruber et al. J. Immunol 152: 
5860-5867, 1994; Johnston ef al, J Immunol 153: 
1762-1768, 1994. 

[0326] Those proteins or polypeptides which exhibit 
activity as reproductive hormones or regulators of cell 
movement may then be formulated as pharmaceuticals 
and used to treat clinical conditions in which regulation 
of reproductive hormones are beneficial. For example, 
a protein or polypeptide may exhibit activin- or inhibin- 
related activities. Inhibins are characterized by their 
ability to inhibit the release ol follicle stimulating hor- 
mone (FSH), while activins are characterized by their 
ability to stimulate the release of FSH. Thus, a protein 
or polypeptide of the present invention, alone or in het- 
erodimers with a member of the inhibin a family, may be 
useful as a contraceptive based on the ability of inhibins 
to decrease fertility in female mammals and decrease 
spermatogenesis in male mammals. Administration of 
sufficient amounts of other inhibins can induce infertility 
in these mammals. Alternatively, the protein or polypep- 
tide of the invention, as a homodimer or as a heterodim- 
er with other protein subunits of the inhibin-B group, may 
be useful as a fertility inducing therapeutic, based upon 
the ability of activin molecules in stimulating FSH re- 
lease from cells of the anterior pituitary. See, for exam- 
ple, United States Patent 4,798,885. A protein or 
polypeptide of the invention may also be useful for ad- 
vancement of the onset of fertility in sexually immature 
mammals, so as to increase the lifetime reproductive 
performance of domestic animals such as cows, sheep 
and pigs. 



[0327] Alternatively, as described in more detail be- 
low, nucleic acids encoding reproductive hormone reg- 
ulating activity proteins or polypeptides or nucleic acics 
regulating the expression of such proteins or pofypep- 
s tides may be introduced into appropriate host cells lo 
increase or decrease the expression of the proteins or 
polypeptides as desired. 

EXAMPLE 27 

w 

Assaying the Expressed Proteins or Polypeptides For 
Chemotactic/Chemokinetic Activity 

[032B] The proteins or polypeptides of the present in- 

»s ventton may also be evaluated for chemotactic/chem- 
okinetic activity. For example, a protein or polypeptide 
of the present invention may have chemotactic or chem- 
okinetic activity (e.g., act as a chemokine) for mamma- 
lian cells, including, for example, monocytes, fibrob- 

20 lasts, neutrophils, T-cells, mast cells, eosinophils, epi- 
thelial and/or endothelial cells. Chemotactic and chem- 
okinetic proteins or polypeptides can be used to mobi- 
lize or attract a desired cell population to a desired site 
of action. Chemotactic or chemokinetic proteins or 

25 polypeptides provide particular advantages in treatment 
of wounds and other trauma to tissues, as well as in 
treatment of localized infections. For example, attraction 
of lymphocytes, monocytes or neutrophils to tumors or 
sites of infection may result in improved immune re- 

30 sponses against the tumor or infecting agent. 

[0329] A protein or polypeptide has chemotactic ac- 
tivity for a particular cell population if it can stimulate, 
directly or indirectly, the directed orientation or move- 
ment ol such cell population. Preferably, the protein or 

35 polypeptide has the ability to directly stimulate directed 
movement of cells. Whether a particular protein or 
polypeptide has chemotactic activity for a population of 
cells can be readily determined by employing such pro- 
tein or polypeptide in any known assay for cell chemo- 

40 taxis. 

[0330] The activity of a protein or polypeptide of the 
invention may, among other means, be measured by the 
following methods: 

[0331] Assays for chemotactic activity (which wilt 
45 identify proteins or polypeptides that induce or prevent 
chemotaxis) consist of assays that measure the ability 
of a protein or polypeptide to induce the migration of 
cells across a membrane as well as the ability of a pro- 
tein or polypeptide to induce the adhesion of one cell 
so population to another cell population. Suitable assays 
for movement and adhesion include, without limitation, 
those described in: Current Protocols in Immunology, 
Ed by J.E. Coligan, A.M. Kruisbeek, D.H. Margulies, E. 
M. Shevach, W. Strober, Pub. Greene Publishing Asso- 
55 ciates and Wiley-lnterscience.Chapter 6.12: 
6.12.1-6.12.28; Taub et al. J. Clin. Invest. 95: 
1370-1376, 1995; Lind ef al APMIS 103: 140-1 46, 1995; 
Mueller et al, Eur. J. Immunol. 25:1744-1748; Gruber 
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et al. J. Immunol 152:5860-5867, 1994; Johnston etal. 
J. Immunol., 153:1762-1768, 1994. 

EXAMPLE 28 

Assaying the Expressed Proteins or Po lypeptides for 
Regulation of Blood Clotting 

[0332] The proteins or polypeptides of the present in- 
vention may also be evaluated for their effects on blood 
clotting. Numerous assays for such activity are familiar 
to those skilled in the art. including the assays disclosed 
in the following references: Linet etal., J. Clin. Pharma- 
col. 26: 1 31 -1 40, 1 9B6; Burdick er al, Thrombosis Res. 
45:413-419. 1987; Humphrey ef al, Fibrinolysis 5:71 -79 
(1991); Schaub, Prostaglandins 35:467-474. 1988. 
[0333] Those proteins or polypeptides which are in- 
volved in the regulation of blood clotting may then be 
formulated as pharmaceuticals and used to treat clinical 
conditions in which regulation of blood clotting is bene- 
ficial. For example, a protein or polypeptide of the inven- 
tion may also exhibit hemostatic or thrombolytic activity. 
As a result, such a protein or polypeptide is expected to 
be useful in treatment of various coagulations disorders 
(including hereditary disorders, such as hemophilias) or 
to enhance coagulation and other hemostatic events in 
treating wounds resulting from trauma, surgery or other 
causes. A protein or polypeptide of the invention may 
also be useful for dissolving or inhibiting formation of 
thromboses and for treatment and prevention of condi- 
tions resulting therefrom (such as infarction of cardiac 
and central nervous system vessels (e.g., stroke)). Al- 
ternatively, as described in more detail below, nucleic 
acids encoding blood clotting activity proteins or 
polypeptides or nucleic acids regulating the expression 
of such proteins or polypeptides may be introduced into 
appropriate host cells to increase or decrease the ex- 
pression of the proteins or polypeptides as desired. 

EXAMPLE 29 

Assaying the Expressed Proteins or Polypeptides for 
Involvement in Receptor/Liaand In teractions 

[0334] The proteins or polypeptides of the present in- 
vention may also be evaluated for their involvement in 
receptor/I igand interactions. Numerous assays for such 
involvement are familiar to those skilled in the art, in- 
cluding the assays disclosed in the following references: 
Chapter 7. 7.28.1-7.28.22) in Current Protocols in Im- 
munology, J.E. Coligan et al Eds. Greene Publishing 
Associates and Wiley-lnterscience; Takai ef al, Proc. 
Natl. Acad. Sci. USA 84:6864-6868. 1987; Bierer et al, 
J. Exp. Med. 168:1145-1156, 19B8; Rosenstein etal, J. 
Exp Med. 169:149-160, 1989; Stoltenborg etal, J. Im- 
munol. Methods 17S:59-68, 1994; Stitt et al, Cell 80: 
661-670, 1995; Gyuris etal, Ce«75:791 -803, 1993. 
[0335] For example, the proteins or polypeptides of 



the present invention may also demonstrate activity as 
receptors, receptor ligands or inhibitors or agonists z\ 
receptor/ligand interactions. Examples of such recep- 
tors and ligands include, without limitation, cytokine re- 
5 ceptors and their ligands, receptor kinases and their lig- 
ands, receptor phosphatases and their ligands, recep- 
tors involved in cell-cell interactions and their ligancs 
(including without limitation, cellular adhesion mole- 
cules (such as selectins, integrins and their ligands) and 
10 receptor/ligand pairs involved in antigen presentation, 
antigen recognition and development of cellular and hu- 
moral immune responses). Receptors and ligands are 
also useful for screening of potential peptide or small 
molecule inhibitors of the relevant receptor/ligand inter- 
is action. A protein or polypeptide of the present invention 
(including, without limitation, fragments of receptors and 
ligands) maybe useful as inhibitors of receptor/ligand in- 
teractions. Alternatively, as described in more detail be- 
low, nucleic acids encoding proteins or polypeptides in- 
20 volved in receptor/ligand interactions or nucleic acids 
regulating the expression of such proteins or polypep- 
tides may be introduced into appropriate host cells to 
increase or decrease the expression of the proteins or 
polypeptides as desired. 

25 

EXAMPLE 30 

Assaying the Proteins or Polypeptides for Anti- 
Inflammatory Activity 

30 

[0336] The proteins or polypeptides of the present in- 
vention may also be evaluated for anti-inflammatory ac- 
tivity. The anti-inflammatory activity may be achieved by 
providing a stimulus to cells involved in the inflammatory 
35 response, by inhibiting or promoting cell-cell interac- 
tions (such as, for example, cell adhesion), by inhibiting 
or promoting chemotaxis of cells involved in the inflam- 
matory process, inhibiting or promoting cell extravasa- 
tion, or by stimulating or suppressing production of other 
40 factors which more directly inhibit or promote an inflam- 
matory response. Proteins or polypeptides exhibiting 
such activities can be used to treat inflammatory condi- 
tions including chronic or acute conditions, inducing 
without limitation inflammation associated with infection 
45 (such as septic shock, sepsis or systemic inflammatory 
response syndrome), ischemia reperfusioninury, endo- 
toxin lethality, arthritis, complement-mediated hypera- 
cute rejection, nephritis, cytokine- or chemokine-in- 
duced lung injury, inflammatory bowel disease, Crohn's 
so disease or resulting from over production of cytokines 
such as TNF or IL-1. Proteins or polypeptides of the in- 
vention may also be useful to treat anaphylaxis and hy- 
persensitivity to an antigenic substance or material. Al- 
ternatively, as described in more detail below, nucleic 
acids encoding anti-inflammatory activity proteins or 
polypeptides or nucleic acids regulating the expression 
of such proteins or polypeptides may be introduced into 
appropriate host cells to increase or decrease the ex- 
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pression o1 the proteins or polypeptides as desired. 
EXAMPLE 31 

A.-yinq the Exore— < ^« or Polypeptidjsjor, 
Tumor Inhibition Activity 

[0337] The proteins or polypeptides of the present in- 
tention may also be evaluated for tumor .nh.brt.on ac- 
ivyTnaddtion tothe activities described above fonm- 

or polypeptide oi the invention may exh.b.t other ant, 
tumor acuities. A protein or polypept.de may .nh.b.t tu- 
mor qrowth directly or indirectly (such as, lor example, 
"a ADCC). A protein or po^peptide may exh.b.t its tu- 

precursor tissue, by inhibiting formation of 
essarv to support tumor growth (such as, for example 
b Siting" ^genesis), by causing , pcod* ™ o 
oLr factors, agents or cell types wh.ch .nh.b tumor 
growth, or by suppressing, eliminat.ng or '^.bmngfac 
tors, agents or cell types which promote tumor growlh_ 
Alternatively, as described in more detail below, nucle.c 
STeneodinfl proteins or polypeptides 
hibition activity or nucleic acids regulating the expres 
sion of such proteins or pofypeptides may be Produced 
nto appropriate host cells to increase or decrease the 
Lression of the proteins or polypeptides as des.red. 
[0338] A protein'or polypeptide of the invents may 
Lo exhibit one or more of the folding add, t.ona ac- 
tivities or effects: inhibiting the growth .n ection or unc- 
tion of, or killing, infectious agents, mc.ud.ng. w. hout 
mitation, bacteria, viruses, fungi and other paras. tes, 
Sng (suppressing or enhancing) bodily character- 
fs ics inc uding, without limitation, height, we.ght. ha, 

mentation, or organ or body part size or shape (such as. 

nboneiorm or shape); effecting biorhylhms or arcad^ 
yclesorrhymmsienectingthefertilityotmaleorfe^ 
subjects; effecting the metabolism, catabol.sm anabo 
lism processing, utilization, storage or el.m.nat.on ol 16, 
etaryL,lipid. P rotein.c.rbohydrate,vitam^— . 
cofactors or other nutritional factors or compon^s) 
effecting behavioral characteristics, .nclud.ng wthout 
Ration, appetite, libido, stress, cognrt^i (including 
ognL disorders), depression (including depress.ve 
disorders) and violent behaviors; prov.d.nganalges.cef- 
f ects or o her pain reducing effects; promoting d.fferen- 
t at on and growth of embryonic stem cells ,rj ..neages 
other than hematopoietic lineages; hormonal or endo- 
crine activriy; in the case of enzymes, correcting defv 
c encies of the enzyme and treating def.cency-related 
diseases; treatment of hyperoroliferative disorders 
jTh as for example, psoriasis); immunoglobu n- ke 
actvity( S uchas,forexam P le.theabilitytob,ndan.^ 
or complement); and the ability to act as an ant.gen n 
a vaccine composition to raise an immune response 



against such protein or another material or ent.ty which 
is cross-reactrve with such protein. Alternatively, as de- 
scribed in more detail below, nucleic acids encoding pro- 
teins or pofypeptides involved in any of the above men- 
5 tioned activities or nucleic acids regulating the expres- 
sion of such proteins may be introduced into appropnate 
host cells to increase or decrease the expression of the 
proteins or polypeptides as desired. 

10 EXAMPLE 32 

} ^ r ,,^ n m Prolans or Polypeptides which Interact 
T^^~T C nr Polypeptides Present Invention 

is [0339] Proteins or polypeptides which interact with 
the proteins or polypeptides of the present mvention, 
such as receptor proteins, may be identified using two 
hybrid systems such as the Matchmaker Two Hybrid 
System 2 (Catalog No. K1604-1, Clontech). As de- 
20 scribed in the manual accompanying the kit, nucleic ac- 
ids encoding the proteins or polypeptides of the present 
invention, are inserted into an expression vector such 
that they are in frame with DNA encoding the DN A bmd- 
ing domain of the yeast transcriptional activator GAL4. 
25 cDNAs in a cDNA library which encode proteins or 
polypeptides which might interact with the proteins or 
polypeptides of the present invention are inserted into 
a second expression vector such that they are in frame 
with DNA encoding the activation domain of GAL4. The 
30 two expression plasmids are transformed into yeast and 
the yeast are plated on selection medium which selects 
for expression of selectable markers on each of the ex- 
pression vectors as well as GALA dependent expres- 
sion of the HIS3 gene. Transformants capable of grow- 
35 ing on medium lacking histidine are screened for GAL4 
dependent lacZ expression. Those cells which are pos- 
itive in both the histidine selection and the lacZ assay 
contain plasmids encoding proteins or polypeptides 
which interact with the proteins or polypeptides of the 
40 present invention. . 

[0340] Alternatively, the system described in Lust.g et 
al , Methods in EnzymofogyZeZ: 83-99 (1 997), may be 
used lor identifying molecules which interact with the 
proteins or polypeptides of the present invention. In 
45 such systems, in vitro transcription reactions are per- 
formed on a pool of vectors containing nucleic acid in- 
serts which encode the proteins or polypeptides of the 
present invention. The nucleic acid inserts are cloned 
downstream of a promoter which drives in wfro tran- 
so scription. The resulting pools of mRNAs are introduced 
into Xenopus laevis oocytes. The oocytes are then as- 
sayed for a desired activity. 

[0341] Alternately, the pooled in vitro transcr.pt.on 
products produced as described above may be translat- 
55 ed in vitro. The pooled in vitro translation products can 
be assayed for a desired activity or for interact.on w.th 
a known protein or polypeptide. 
[0342] Proteins, polypeptides or other molecules in- 
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teracting with proteins or polypeptides o( the present in- 
vention can be found by a variety of additional tech- 
niques. In one method, affinity columns containing the 
protein or polypeptide of the present invention can be 
constructed. In some versions, of this method the affinity 
column contains chimeric proteins in which the protein 
or polypeptide of the present invention is fused to glu- 
tathione S-transf erase. A mixture of cellular proteins or 
pool of expressed proteins as described above and is 
applied to the affinity column. Molecules interacting with 
the protein or polypeptide attached to the column can 
then be isolated and analyzed on 2-D electrophoresis 
gel as described in Ramunsen era/. Electrophoresis, 18, 
5B8-598 (1997). Alternatively, the molecules retained on 
the affinity column can be purified by electrophoresis 
based methods and sequenced. The same method can 
be used to isolate antibodies, to screen phage display 
products, or to screen phage display human antibodies. 
[0343] Molecules interacting with the proteins or 
polypeptides of the present invention can also be 
screened by using an Optical Biosensor as described in 
Edwards & Leatherbarrow, Analytical Biochemistry, 
246, 1-6 (1997). The main advantage of the method is 
that it allows the determination of the association rate 
between the protein or polypeptide and other interacting 
molecules. Thus, it is possible to specifically select in- 
teracting molecules with a high or low association rate. 
Typically a target molecule is linked to the sensor sur- 
face (through a carboxymethl dextran matrix) and a 
sample of test molecules is placed in contact with the 
target molecules. The binding of a test molecule to the 
target molecule causes a change in the refractive index 
and/ or thickness. This change is detected by the Bio- 
sensor provided it occurs in the evanescent field (which 
extend a few hundred nanometers from the sensor sur- 
face). In these screening assays, the target molecule 
can be one of the proteins or polypeptides of the present 
invention and the test sample can be a collection of pro- 
teins, polypeptides or other molecules extracted from 
tissues or cells, a pool of expressed proteins, combina- 
torial peptide and/ or chemical libraries, or phage dis- 
played peptides. The tissues or cells from which the test 
molecules are extracted can originate from any species. 
[0344] In other methods, a target protein or polypep- 
tide is immobilized and the test population is a collection 
of unique proteins or polypeptides of the present inven- 
tion. 

[0345] To study the interaction of the proteins or 
polypeptides of the present invention with drugs, the 
microdialysis coupled to HPLC method described by 
Wang etai, Chromatographia, 44, 205-208(1 997) or the 
affinity capillary electrophoresis method described by 
Busch eta!., J. Chromatogr. 777:311-328 (1997) can be 
used. 

[0346] The system described in U.S. Patent No. 
5,654, 1 50 may also be used to identify molecules which 
interact with the proteins or polypeptides of the present 
invention. In this system, pools of nucleic acids encod- 



ing the proteins or polypeptides of the present invention 
are transcribed and translated in vitro and the reaction 
products are assayed for interaction with a known 
polypeptide or antibody. 

s [0347] It will be appreciated by those skilled in the art 
that the proteins or polypeptides of the present invention 
may be assayed for numerous activities in addition to 
those specifically enumerated above. For example, the 
expressed proteins or polypeptides may be evaluated 

10 for applications involving control and regulation of in- 
flammation, tumor proliferation or metastasis, infection, 
or other clinical conditions. In addition, the proteins or 
polypeptides may be useful as nutritional agents or cos- 
metic agents. 

is [0348] The proteins or polypeptides of the present in- 
vention may be used to generate antibodies capable of 
specifically binding to the proteins or polypeptides of the 
present invention. The antibodies may be monoclonal 
antibodies or polyclonal antibodies. As used herein, "an- 

20 tibody" refers to a polypeptide or group of polypeptides 
which are comprised of at least one binding domain, 
where a binding domain is formed from the folding of 
variable domains of an antibody molecule to form three- 
dimensional binding spaces with an internal surface 

25 shape and charge distribution complementary to the 
features of an antigenic determinant of an antigen., 
which allows an immunological reaction with the anti- 
gen. Antibodies include recombinant proteins compris- 
ing the binding domains, as wells as fragments, includ- 

30 ing Fab, Fab', F(ab) 2 . and F(ab') 2 fragments. 

[0349] As used herein, an "antigenic determinant" is 
the portion of an antigen molecule, that determines the 
specificity of the antigen-antibody reaction. An "epitope" 
refers to an antigenic determinant of a polypeptide. An 

35 epitope can comprise as few as 3 amino acids in a spa- 
tial conformation which is unique to the epitope. Gener- 
ally an epitope consists of at least 6 such amino acids, 
and more usually at least 8-10 such amino acids. Meth- 
ods for determining the amino acids which make up an 

to epitope include x-ray crystallography, 2-dimensional nu- 
clear magnetic resonance, and epitope mapping e.g. 
the Pepscan method described by H. Mario Geysen et 
al. 1984. Proc. Natl. Acad. Sci. U.S.A. 81:3998-4002; 
PCT Publication No. WO 84/03564; and PCT Publica- 
ns tion No. WO 84/03506. 

[0350] In some embodiments, the antibodies may be 
capable of specifically binding to a protein or polypep- 
tide encoded by EST-related nucleic acids, fragments 
of EST-related nucleic acids, positional segments of 
so EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. In some embod- 
iments, the antibody may be capable of binding an an- 
tigenic determinant or an epitope in a protein or polypep- 
tide encoded by EST-related nucleic acids, fragments 
ss of EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. 
[0351] In other embodiments, the antibodies may be 
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present invention. 
EXAMPLE 33 

Protein 

103531 The above described EST-related nucleic ac- 
l f , inmTnts of EST-related nucleic acids, posmona 
' nTElT-Telated nucleic acids or fragments of 
segments of EST ela ec ' jc ac|ds 0( nu . 

positional segmentsof EST reteneo " 

.rlrFST-related polypeptides are operabty 
STp— 1 induced into eel, as de- 

then be prepared as follows. 

iMonc^lo^^ 

Fusion 



therefrom over a period of a few weeks Th. . mouse I. 
then sacrificed, and the antibody producing cells of the 

ootvethylene glycol with mouse myeloma cells, and the 
on selective media comprising aminoptenn (HAT me- 
dia). The successfully lused cells are 
uots of the dilution placed in wells of a m.cro«ter plate 
wt,e growth of the cutture b continued. 
,o dudhq clones are identified by detection of antibody « 
iTupenatant fluid of the wells by '—ssay pro- 
cedures such as Elisa. as originally described by 

Enovall W Enzyme* 70:419 (l980) Se ' eCt T" 
«ve c^es can be expanded and their monoclonal an, 
,5 bcCroduct harvested for use. Detailed procedures 
Monoclonal antibody Paction are 
Davis. L. ef al in Basic Mefhods <" Molecular Bo/ogy 
Elsevier. New York. Section 21-2. 
20 * o„ rl ™ al Antibo dy p,--"™ h Y immunization 

I03S71 Polyclonal antiserum containing antibodies to 
^e ^enouUpitopesofasingleprote^orrx.lypep^ 
L be prepared by immunizing suitable animals with 
? 5 Z expressed protein or peptides derived therefrom. 
lichTn be unmodified or modified to enhance immu- 
nogenicity. Effects polyclonal antibody P«*f "°" ® 

me tat species. For example, small molecules tend to 
oo t fiefs immunogenic than others and may requ. the 
use of carriers and adjuvant. Also, host animals re 
sponse vary depending on site of inoculations and *» 
es^th boih inadequate or excessive doses of antigen 
MuUng in low titer antisera. Small doses (ng level) of 
* TnCn administered at multiple intradermal sites ap- 
peaLobemos.re,iab.e.Ane«ect.ve,mmumza«»np^ 

tocol for rabbits can be found in Vaitukaitis. at al.J. Clm. 
Endocrinol. Melab. 33:988-991 (1971). 
rnasai Booster injections can be given at regular in 
40 S antiserum harvested when antibody titer 
hereof' as determined semi<,uantitatrvei y . for example, 
by doubte immunodiffuston in agar against known con- 

pie ouchterlony, * al.. Chap. 19 ,n: *i^ ?J* 
<s ftn-nW Immunology D. Wier (ed, B^weMI^ 
Plateau concentration of antibody is usually n the range 
of 0 1 to 0.2 mg/ml of serum (about 12 uM). Affinity of 
l antisera foAhe antigen is determined by preparing 
TomTetitive binding curves, as described, for example 
so C b y m Fisher, D ..Chap.42»i: ^n^ofC,^ — - 
ogy, 2d Ed. (Rose and Friedman, Eds.) Amer. Soc. For 
Microbiol, Washington, O.C. (1980). 
m 9] Antibody preparations prepared according to 
ler of the above protocols are useful ,n . «*V °< 
55 contexts In particular, the antibodies may be used in 
mmunoatfiniW chromatography techniques such as 
"osedescribedbelowtolacilitatelarge scale isotetion, 

plication, or enrichment of the proteins or polypep- 
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tides encoded by EST-related nucleic acids, positiona 
segments of EST-related nucleic acids or fragments ol 
positional segments of EST-related nucleic acids or (or 
the isolation, purification or enrichment of EST-related 
polypeptides, fragments of EST-related polypept.des. 
positional segments of EST-related polypept.des or 
fragments of positional segments of EST-related 
polypeptides. 

[03601 In the case of secreted proteins, the antibodies 
may be used for the isolation, purification, or enrichment 
of the full protein (i.e. the mature protein and the signal 
peptide), the mature protein (i.e. the protein generated 
by cleavage of the signal peptide), or the signal peptide 
are operably linked to promoters and introduced into 
cells as described above. 

[03611 Additionally, the antibodies may be used in im- 
munoaffinity chromatography techniques such as those 
described below to isolate, purify, or enrich polypeptides 
which have been linked to the proteins or polypeptides 
encoded by EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids or to iso- 
late purify, or enrich EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides or fragments of positional 
segments of EST-related polypeptides. 
[0362] The antibodies may also be used to determine 
the cellular localization of polypeptides encoded by the 
proteins or polypeptides encoded by EST-related nucle- 
ic acids, positional segments of EST-related nucleic ac- 
ids or fragments of positional segments of EST-related 
nucleic acids or the cellular localization of EST-related 
polypeptides, fragments of EST-related polypeptides, 
positional segments of EST-related polypeptides or 
fragments of positional segments ol EST-related 
polypeptides. 

[03631 In addition, the antibodies may also be used to 
determine the cellular localization of polypeptides which 
have been linked to the proteins or polypeptides encod- 
ed by EST-related nucleic acids, positional segments ol 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids or polypeptides 
which have been linked EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides or fragments of positional 
segments of EST-related polypeptides . 
[0364] The antibodies may also be used in quantita- 
tive immunoassays which determine concentrations of 
antigen-bearing substances in biological samples; they 
may also used semi-quantitativety or qualitatively to 
identify the presence of antigen in a biological sample 
or to identify the type of tissue present in a b.ologtcal 
sample The antibodies may also be used in therapeutic 
compositions for killing cells expressing the protein or 
reducing the levels of the protein in the body. 



V. Use of 5'ESTs and Consensus Contigated 5' ESTs 
or Sequences Obtainable Therefrom or Portions 
Thereof as Reagents 

5 [0365] The E ST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids 'may be 
used as reagents in isolation procedures, diagnostic as- 
says, and forensic procedures. For example, sequenc- 
w es from the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids, may be 
detectably labeled and used as probes to isolate other 
sequences capable of hybridizing to them. In addition, 
)5 the he EST-related nucleic acids, positional segments 
of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids may be used to 
design PCR primers to be used in isolation, diagnostic, 
or forensic procedures. 

20 

1 Usb of EST-related nucleic acids r positional 
APg mftnts of EST-related nucleic aci ds or fragments of 
positional segments of EST-related nucleic acids in 
isolation, diagnostic and foren sic procedures 

25 

EXAMPLE 34 

Proration of PCR Primers and Am plification of DNA 

30 [0366] The EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may be 
used to prepare PCR primers for a variety of applica- 
tions, including isolation procedures for cloning nucleic 
35 acids capable of hybridizing to such sequences, diag- 
nostic techniques and forensic techniques. In some em- 
bodiments, the PCR primers at least 10. 15. 18. 20, 23. 
25, 28. 30, 40, or 50 nucleotides in length. In some em- 
bodiments, the PCR primers may be more than 30 bas- 
40 es in length. It is preferred that the primer pairs have 
approximately the same G/C ratio, so that melting tem- 
peratures are approximately the same. A variety of PCR 
techniques are familiar to those skilled in the art. For a 
review of PCR technology, see Molecular Cloning to Ge- 
45 netic Engineering White. B.A. Ed. in Methods in Molec- 
ular Biology 67: Humana Press, Totowa 1997. In each 
of these PCR procedures. PCR primers on either side 
of the nucleic acid sequences to be amplified are added 
to a suitably prepared nucleic acid sample along with 
so dNTPs and a thermostable polymerase such as Taq 
polymerase. Pfu polymerase, or Vent polymerase. The 
nucleic acid in the sample is denatured and the PCR 
primers are specifically hybridized to complementary 
nucleic acid sequences in the sample. The hybridized 
55 primers are extended. Thereafter, another cycle of de- 
natu ration, hybridization, and extension is initiated. The 
cycles are repeated multiple times to produce an ampli- 
fied fragment containing the nucleic acid sequence be- 
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tween the primer sites. 
EXAMPLE 35 

Use of the EST-related nucleic acids, positional ' 
segments of EST-related nucleic acids or fragments of 
positional segments of EST -related nucleic acids as 
probes 

[0367] Probes derived from EST-related nucleic ac- i 
ids. positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids may be labeled with detectable labels familiar to 
those skilled in the art. including radioisotopes and non- 
radioactive labels, to provide a detectable probe. The i 
detectable probe may be single stranded or double 
stranded and may be made using techniques known in 
the art, including in vitro transcription, nick translation, 
or kinase reactions. A nucleic acid sample containing a 
sequence capable of hybridizing to the labeled probe is • 
contacted with the labeled probe. If the nucleic acid in 
the sample is double stranded, it may be denatured prior 
to contacting the probe. In some applications, the nu- 
cleic acid sample may be immobilized on a surface such 
as a nitrocellulose or nylon membrane. The nucleic acid 
sample may comprise nucleic acids obtained from a va- 
riety of sources, including genomic DNA, cDNA librar- 
ies, RNA, or tissue samples. 

[0368] Procedures used to detect the presence of nu- 
cleic acids capable of hybridizing to the detectable 
probe include well known techniques such as Southern 
blotting, Northern blotting, dot blotting, colony hybridi- 
zation, and plaque hybridization. In some applications, 
the nucleic acid capable of hybridizing to the labeled 
probe may be cloned into vectors such as expression 
vectors, sequencing vectors, or in vitro transcription 
vectors to facilitate the characterization and expression 
of the hybridizing nucleic acids in the sample. For ex- 
ample, such techniques may be used to isolate and 
clone sequences in a genomic library or cDNA library 
which are capable of hybridizing to the detectable probe 
as described in Example 18 above. 
[0369] PCR primers made as described in Example 
34 above may be used in forensic analyses, such as the 
DNA fingerprinting techniques described in Examples 
36-40 below. Such analyses may utilize detectable 
probes or primers based on the sequences of the EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids or fragments of positional segments of 
EST-related nucleic acids. 

EXAMPLE 36 

Forensic Matching by D NA Sequencing 

[0370] In one exemplary method, DNA samples are 
isolated Irom forensic specimens of. for example, hair, 
semen, blood or skin cells by conventional methods. A 



panel of PCR primers based on a number of the EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids or fragments of positional segments of 
EST-related nucleic acids is then utilized in accordance 

i with Example 34 to amplify DNA of approximately 
100-200 bases in length from the forensic specimen. 
Corresponding sequences are obtained from a test sub- 
ject. Each of these identification DNAs is then se- 
quenced using standard techniques, and a simple da- 

o tabase comparison determines the differences, if any, 
between the sequences from the subject and those from 
the sample. Statistically significant differences between 
the suspect's DNA sequences and those from the sam- 
ple conclusively prove a lack of identity. This lack of 

s identity can be proven, for example, with only one se- 
quence. Identity, on the other hand, should be demon- 
strated with a large number of sequences, all matching. 
Preferably, a minimum of 50 statistically identical se- 
quences of 100 bases in length are used to prove iden- 

to tity between the suspect and the sample. 

EXAMPLE 37 

Positive Identification by DNA Sequencing 

25 

[0371] The technique outlined in the previous exam- 
ple may also be used on a larger scale to provide a 
unique fingerprint-type identification of any individual. In 
this technique, primers are prepared from a large 

30 number of EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids. Prefer- 
ably, 20 to 50 different primers are used. These primers 
are used to obtain a corresponding number of PCR-gen- 

35 erated DNA segments from the individual in question in 
accordance with Example 34. Each of these DNA seg- 
ments is sequenced, using the methods set forth in Ex- 
ample 36. The database of sequences generated 
through this procedure uniquely identifies the individual 

40 from whom the sequences were obtained. The same 
panel of primers may then be used at any later time to 
absolutely correlate tissue or other biological specimen 
with that individual. 

45 EXAMPLE 38 

Southern Blot Forensic Identification 

[0372] The procedure of Example 37 is repeated to 
50 obtain a panel of at least 10 amplified sequences from 
an individual and a specimen. Preferably, the panel con- 
tains at least 50 amplified sequences. More preferably, 
the panel contains 100 amplified sequences. In some 
embodiments, the panel contains 200 amplified se- 
55 quences. This PCR-generated DNA is then digested 
with one or a combination of. preferably, four base spe- 
cific restriction enzymes. Such enzymes are commer- 
cially available and known to those of skill in the art. After 
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digestion, the resultant gene fragments are size sepa- 
rated in multiple duplicate wells on an agarose gel and 
transferred to nitrocellulose using Southern blotting 
techniques well known to those with skill in the art. For 
a review of Southern blotting see Davis et ai (Basic 
Methods in Molecular Biology. 1986. Elsevier Press, pp 
62-65). 

[0373] A panel of probes based on the sequences of 
the EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids are radioactively 
or colorimetrically labeled using methods known in the 
art. such as nick translation or end labeling, and hybrid- 
ized to the Southern blot using techniques known in the 
art (Davis etal, supra). Preferably, the probe is at least 
10. 12, 15. 18, 20. 25. 28. 30, 35. 40, 50. 75. 100, 150, 
200, 300, 400 or 500 nucleotides in length. Preferably, 
the probes are at least 10, 12, 15, 18, 20, 25. 28, 30, 
35, 40, 50, 75, 100. 150, 200, 300, 400 or 500 nucle- 
otides in length. In some embodiments: the probes are 
oligonucleotides which are 40 nucleotides in length or 
less. 

[0374] Preferably, at least 5 to 10 of these labeled 
probes are used, and more preferably at least about 20 
or 30 are used to provide a unique pattern. The resultant 
bands appearing from the hybridization of a large sam- 
ple of EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids will be a unique 
identifier. Since the restriction enzyme cleavage will be 
different for every individual, the band pattern on the 
Southern blot will also be unique. Increasing the number 
of probes will provide a statistically higher level of con- 
fidence in the identification since there will be an in- 
creased number of sets of bands used for identification. 

EXAMPLE 39 

Dot Blot Identification Procedure 

[0375] Another technique for identifying individuals 
using the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids dis- 
closed herein utilizes a dot blot hybridization technique. 
[0376] Genomic DNA is isolated from nuclei of subject 
to be identified. Probes are prepared that correspond to 
at least 10, preferably 50 sequences from the EST-re- 
lated nucleic acids, positional segments of EST-related 
nucleic acids or fragments of positional segments of 
EST-related nucleic acids. The probes are used to hy- 
bridize to the genomic DNA through conditions known 
to those in the art. The oligonucleotides are end labeled 
with P 32 using polynucleotide kinase (Pharmacia). Dot 
Blots are created by spotting the genomic DNA onto ni- 
trocellulose or the like using a vacuum dot blot manifold 
(BioRad, Richmond California). The nitrocellulose filter 
containing the genomic sequences is baked or UV 



linked to the filter, prehybridized and hybridized with la- 
beled probe using techniques known in the art (Davis et 
a/., supra). The 32 P labeled DNA fragments are sequen- 
tially hybridized with successively stringent conditions 

s to detect minimal differences between the 30 bp se- 
quence and the DNA. Tetramethylammonium chloride 
is useful for identifying clones containing small numbers 
of nucleotide mismatches (Wood etal., Proc. Natl. Acad. 
Set. USA 82(6): 1 585-1 588 (1985)). A unique pattern of 

io dots distinguishes one individual from another individu- 
al. 

[0377] EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids can be 

is used as probes in the following alternative fingerprinting 
technique. In some embodiments, the probes are oligo- 
nucleotides which are 40 nucleotides in length or less. 
[0378] Preferably, a plurality of probes having se- 
quences from different EST-related nucleic acids, posi- 

so tional segments of EST-related nucleic acids or frag- 
ments of positional segments of EST-related nucleic ac- 
ids are used in the alternative fingerprinting technique. 
Example 40 below provides a representative alternative 
fingerprinting procedure in which the probes are derived 

25 from EST-relaled nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. 

EXAMPLE 40 

30 

Alternative 'Fingerprint* Identification Technique 

[0379] Oligonucleotides are prepared from a large 
number, e.g. 50, 100, or 200, EST-related nucleic acids, 

35 positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids using commercially available oligonucleotide 
services such as Genset, Paris, France. Preferably, the 
oligonucleotides are at least 10, 15, 18, 20. 23, 25 28, 

40 or 30 nucleotides in length. However, in some embodi- 
ments, the oligonucleotides may be more than 30 nu- 
cleotides in length. 

[0380] Cell samples from the test subject are proc- 
essed for DNA using techniques well known to those 

45 with skill in the art. The nucleic acid is digested with re- 
striction enzymes such as EcoRI and Xbal. Following 
digestion, samples are applied to wells for electrophore- 
sis. The procedure, as known in the art, may be modified 
to accommodate polyacrylamide electrophoresis, how- 

so ever in this example, samples containing 5 ug of DNA 
are loaded into wells and separated on 0.8% agarose 
gels. The gels are transferred onto nitrocellulose using 
standard Southern blotting techniques. 
[0381] 10 ng of each of the oligonucleotides are 

55 pooled and end-labeled with P 32 . The nitrocellulose is 
prehybridized with blocking solution and hybridized with 
the labeled probes. Following hybridization and wash- 
ing, the nitrocellulose filter is exposed to X-Omat AR X- 
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ray film. The resulting hybridization pattern will be 
unique for each individual. 

[0382] It is additionally contemplated within this ex- 
ample that the number of probe sequences used can be 
varied for additional accuracy or clarity. 
[0383] In addition to their applications in forensics and 
identification, EST-related nucleic acids, positional seg- 
ments ot EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may be 
mapped to their chromosomal locations. Example 41 
below describes radiation hybrid (RH) mapping of hu- 
man chromosomal regions using EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 
cleic acids. Example 42 below describes a representa- 
tive procedure for mapping EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids to their locations on human chromosomes. Exam- 
ple 43 below describes mapping of EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 
cleic acids on naetaphase chromosomes by Fluores- 
cence In Situ Hybridization (FISH). 

2. Use of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or frag ments of 
positional segments of EST-related nucleic acids in 
Chromosome Mapping 

EXAMPLE 41 

Radiation hybrid mapping of EST- related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-rolated nucleic 
acids to the human genome 

[0384] Radiation hybrid (RH) mapping is a somatic 
cell genetic approach that can be used for high resolu- 
tion mapping of the human genome. In this approach, 
cell lines containing one or more human chromosomes 
are lethatly irradiated, breaking each chromosome into 
fragments whose size depends on the radiation dose. 
These fragments are rescued by fusion with cultured ro- 
dent cells, yielding subclones containing different por- 
tions of the human genome. This technique is described 
by Benham etal. {Genomics 4:509-517, 1989) and Cox 
et a/., (Science 250:245-250, 1990). The random and 
independent nature of the subclones permits efficient 
mapping of any human genome marker. Human DNA 
isolated from a panel of 80-100 cell lines provides a 
mapping reagent for ordering EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids. In this approach, the frequency of breakage be- 
tween markers is used to measure distance, allowing 
construction of fine resolution maps as has been done 
using conventional ESTs (Schuler et al, Science 274: 



540-546, 1996). 

[0385] RH mapping has been used to generate a 
high-resolution whole genome radiation hybrid map of 
human chromosome 17q22-q25. 3 across the genes for 

s growth hormone (GH) and thymidine kinase (TK) (Fos- 
ter etal., Genomics 33:185-192, 1996), the region sur- 
rounding the Gorlin syndrome gene (Obermayr et at., 
Eur. J. Hum. Genet. 4:242-245, 1996), 60 loci covering 
the entire short arm of chromosome 12 (Raeymaekers 

to et al, Genomics 29:170-178. 1995). the region of hu- 
man chromosome 22 containing the neurofibromatosis 
type 2 locus (Frazeref a/., Genomics 14:574-584, 1992) 
and 13 loci on the bng arm of chromosome 5 (War- 
rington etal., Genomics 11:701-708, 1991). 

15 

EXAMPLE 42 

Mapping of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
20 positional segments of EST-related nucleic acids to 
Human Chromosomes using PCR techniques 

[0386] EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 

25 sitional segments of EST-related nucleic acids may be 
assigned to human chromosomes using PCR based 
methodologies. In such approaches, oligonucleotide 
primer pairs are designed from EST-related nucleic ac- 
ids, positional segments of EST-related nucleic acids or 

30 fragments of positional segments of EST-related nucleic 
acids to minimize the chance of amplifying through an 
intron. Preferably, the oligonucleotide primers are 18-23 
bp in length and are designed for PCR amplification. The 
creation of PCR primers from known sequences is well 

35 known to those with skill in the art. For a review of PCR 
technology see Erlich. in PCR Technology; Principles 
and Applications for DNA Amplification. 1992. W.H. 
Freeman and Co., New York. 

[0387] The primers are used in polymerase chain re- 

40 actions (PCR) to amplify templates from total human ge- 
nomic DNA. PCR conditions are as follows: 60 ng of ge- 
nomic DNA is used as a template for PCR with 80 ng of 
each oligonucleotide primer, 0.6 unit of Taq polymerase, 
and 1 u,Cu of a 32P-labeled deoxycytidine triphosphate. 

45 The PCR is performed in a microplate thermocycler 
(Techne) under the following conditions: 30 cycles of 
94°C, 1 .4 min; 55°C, 2 min; and 72°C, 2 min; with a final 
extension at 72°C for 10 min. The amplified products 
are analyzed on a 6% potyacrylamide sequencing gel 

so and visualized by autoradiography. If the length of the 
resulting PCR product is identical to the distance be- 
tween the ends of the primer sequences in the 5'EST 
from which the primers are derived, then the PCR reac- 
tion is repeated with DNA templates from two panels of 

55 human-rodent somatic celt hybrids, BIOS PCRable 
DNA (BIOS Corporation) and NIG MS Human-Rodent 
Somatic Cell Hybrid Mapping Panel Number 1 (NIGMS, 
Camden, NJ). 
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(03881 PCB is used to screen a secies oi somatic cell 
mosomes foe Ihe presence ol a given 5 EST. DN A is bc- 
p,a,es lor PCBreactionsusing me prime, pairstromthe 
EST-related nucleic acids, positional segments o EST 
rrtttd nucleic acids or fragments ol positional seg- 
men^EST-related nucleic acids. Only those somatic 
ceThyorids with chromosomes containing the human 

to Item of PCB products from the somatic hyfand 
WA templates. The single human chromosome 
presence,, cel, hybrids tha, g^e rise ,c ^ , arnpl^d 
Lament is the chromosome containing that EST-relat 

oleic acids or fragments of positional "g"*"**^ 
related nucleic acids For a rev,ew ol techniques and 
analysis of results from somatic cell gene m8pp»9 e* 
periments. (See Ledbetter e. al., Genomics 6.475-481 

S Alternatively, me EST-related 
ciitional segments of EST-related nucleic adds or 
of Positional segments of EST-related nude* 
Smaybe'mapped.oindMdualchromosomes using 

FISH as described in Example 43 below. 
EXAMPLE 43 

Maooina of crr-Mlaied nucK " *> rirt * positional 
rhrnmnsomes Using 



cinnrascence Hybridization 

[0390] Fluorescence in situ hybridization .tows the 
EST-related nucleic acids, positional segments o EST- 
felated nucleic acids or fragments of positional seg- 
ments of EST-related nucleic add. to be mapped tc a 
particular location on a grven chromosome The ch o 
mosomes to be used for fluorescence in situ hybndiza 

es including cell cultures, tissues, or whole blood. 
10391 in a preferred embodiment, chromosomal to- 
Sen of EST-related nucleic acids, posmonal seg- 
„X o?EST-,e.ated nucleic acids or fragments of po- 
Sa, segments o. EST-rela,e< J nucleic * 
tained by FISH as described by Chenf ef al. (Proc. Natl. 
B "H USA, 87:6639-6643, 1990). Metaphase 
chTomo^es^ 'prepared from P^Y™^™ 
^HM-stimulated blood cell donors. PHA-stimutoted 
KphCslrom healthy males areculturedjdr^ 

RPMI-1640 medium. For synchronization methc4re X a, e 
00 uM) is added for 17 h, followed by addition of 5-bro- 



modeoxyuridine (5-BrdU, 0.1 mM) for 6 h. Colcemid (i 
up/ml) is added for the last 1 5 min before harvesting the 
cells/Cells are collec.ed. washed in RPMI incubated 
with a hypotonic solution of KCI (75 mM) at 37'C for 15 
s min and fixed in three changes of "«hB«^ 
(31) The cell suspension is dropped onto a glass slide 
and air dried. The EST-related nucleic acids, positiona 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids ,s la- 
,„ oiled with biotin-16 dUTP by nick translation according 
to the manufacturer's instructions (Bethesda Research 
Laboratories. Bethesda, MD). purified using a Sepha- 
dex G-50 column (Pharmacia, Upsala. Sweden) and 
precipitated. Just prior to hybridization, the DNA pellet 
,5 is dissolved in hybridization buffer (50% formam.de 2 X 
SSC. 10% dextran sulfate. 1 mg/ml sonicated salmon 
sperm DNA. pH 7) and the probe is denatured at 70 C 
for 5-10 min. „_ 
[0392] Slides kept at -20°C are treated for 1 h at 37 C 
20 wi.hRNaseA(lCOug/ml).rinsedthree. 1 mes.n2XSSC 
and dehydrated in an ethanol senes. Chromosome 
preparations are denatured in 70% formamid* 2 X SSC 
for 2 min at 70-C, then dehydrated at 4-C. The slides 
are treated with proteinase K (10 ug/100 ml in 20 mM 
25 Tris-HCI. 2 mM Cacy at 37«C lor S min and dehydKH- 
ed The hybridization mixture containing the probe s 
placedontheslide.coveredwithacoverslip. sealed w<th 

rubber cement and incubated overnight in a hum«d 
chamber at 37-C. Alter hybridization and post-hybndi- 
30 zation washes, Ihe biolinylated probe is detected by avv 
din-FITC and amplified with additional layers of bioti- 
nylated goat anti-avidin and avidin-FITC. For chromo- 
somal localization, fluorescent R-bands areobtainedas 
previously described (Cherif el al., supra.). The slides 
3S are observed under a LEICA fluorescence "jcroscope 
(DMBXA) Chromosomes are counterstained with pro- 
pidiumtodWeandthefluorescenlsignaloftheprobeap- 
pears as two symmetrical yellow-green spots on both 
chromatids of the fluorescent B-band chromosome 
40 (red). Thus, a particular EST-related nucleic acids po- 
sitional segments of EST-related nucleic adds or frag- 
ments of positional segments ol EST-related nucleicac- 
ids may be localized to a particular cytogenete R-band 
on a gL chromosome. Once the EST-related nucle-c 
« acids positional segments ol EST-rela.ee I nucleic acrts 
or fragments of positional segments ol ES™atednu. 
oleic acids have been assigned to particular chromo- 
somes using the techniques described ,n Examples 
41-43 above, they may be utilized to construct a high 
50 resolution map of the chromosomes on which they are 
located or to identify the chromosomes in a sample. 



EXAMPLE 44 

55 i i co m FST-relate H "dds. positional segments 

^r^i^ n.irlair acids H"r°" l!; °' Positional. 
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Fv pand Chf """*"" 18 MapS 

T03931 Chromosome mapping involves assigning a 
jujyjj narticular chromosome as 

cleic acids, positional segmenls ol EST reiaiea 
ed nucleic acids are obta.ned Tins, PP 

In"— e is broken Wo 

ids whose position « to *tem™V „ can 

ibra£ to determine the location ol each of the _ES ^ 

25S2SSSSS 

KTo^etibed in Example 45 belowEST-re^t- 
Seicacids,positoalsegmentsofEST-.elatednu- 

itary disease or drug response. 
jrtentification 



A. association o. EST-related nucleic acds , pos^a 
seqments ol EST-related nucleic acds or fragments ol 
2SL segments ol EST-related nuclei acds wdh 
Sar Phenotyoic characteristics. In this examp e. a 
s Articular EST-related nucleic acids, pos,t,onal seg- 
™n* of EST-related nucleic acids or fragments of po- 
« egments of EST-related nucleic acids is used 
as a test prabe to associate that EST-related nucleic ac 
ris postal laments of EST-related nucleic acds or 

acMs with a particular phenotypic characteristic 
toMSl EST^etated nucleic acids, P-**- 
ments of EST-related nucleic acids or fragments of po- 
stanal segments of EST-related nucleic acds are 
.5 rnaooed to a particular location on a human chromo- 
^e using techniques such as those described ,n Ex- 
Tples 4, 9 and 42 or other 

A search of Mendelian Inheritance in Man (V. McKusick^ 

so Johns Hopkins University Welch Medical 

veals the region of the human chromosome wh,ch con- 
«ins the EST-related nucleic acids, positional segments 
of EST-related nucleic acids or fragments of position* 
seqments of EST-related nucleic acids tobeavery gene 
25 Xgioncc.miningseveraltoowngenesandseveral 

identified. The gene corresponding to ^ST^ed 
nucleic acids, positional segments of EST-r°late I no 

30 related nucleic acids thus becomes an .mmediate can 
didate for each of these genetic diseases. 
S Cellsfr^patientswiththesediseasesorphe^ 

notypes are isolated and expanded m culture. PCR 
orimera from the EST-related nucleic acds, positiona 

rc^me patienl EST-reteted nuclei acids. pos,.iona 
stents of EST-related nuclei acids or ragm«t» << 
<0 pSLa, segments of EST-refcted nuc^ ^ >** 
are not amplified in the patients can be P""^ 
cteted with a particular disease by further analysis A - 
XativtythePCRanalysismayyieldfragmentsofd,.. 

TenUe gths when the samples are derfced from an 

ease than when the sample is derived from a healthy 
indivaual. indicating that the gene 
related nucleic acids, positional segments of EST relat 
ed nucleic acids or fragments of P"*"^^^ 
so EST-related nucleic acids may be responsible for the 
genetic disease. 



EXAMPLE 45 

jdeniife*^^ 
,<ic 0a «.flsord"'n response 



10395] 



This example illustrates an approach uselul lor 



VI Use of EST-related nucleic acids, positional 
r B3 men«s of EST-related nuc.e.c acids orlragmente 

SS ot positiona. segments ot EST-related nucleic adds 
to Construct Vectors 

[0398] The present EST-related nucleic acids, posi- 
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tional segments of EST-related nucleic acds or frag- 
rntso.posi.iona.segrnentso.EST.relatednuce.ac- 

Wsmay also be used to construe, secretin ^.orsca- 
pab ,e o. direc.ing .he secretion o. .he pro.e^s encoded 
by genes therein. Such secretion vectors may facfetle 
the purification or enrichment o. the pro.e.ns encoded 
by genes inserted .herein by reducing .he number^ 
background proteins from which the des.red protem 
must be purified or ennched. Exempt secrenon vec- 
tors are described in Example 46 below. 

1 Construction of secreti on vectors 
EXAMPLE 46 

r.nnsnuc.io " "' g^rPtinn Vectors 



[0399] The secretion vectors of the present invent™ 
nclude a promoter capable of directing gene expression 
in the host cel.. .issue, o, organism o. in.eres.. Such pro- 
moters Mud. the Rous Sarcoma Virus promoter, .*» 
SV40 promoter, the human ^<™°S a ™™* P '°™% 
and other promoters familiar to those . ,tahc im , .he ar^ 
[0400] AsignalsequencelromoneoftheEST-related 
Leic acids posi«ional segmen.s o. EST-re a,<* ^nu- 
cleic acids or fragments of positional segments of EST- 
S nucleic acids is operabiy .inked to me promote 
such that me mRNA transcribed from the promoter wfl 
direct the transition o. me signa. peptide Preferably 
ihe signal sequence is from one of the nucle,c acds of 
SEQ ID NOs -24-4100. The host cel.. .issue, or organ- 
ism may be any cell, tissue, or organtem which recog- 
"he signal peptide encoded by 
quence in the EST-related nucle,c aads, posit.onal seg 
men.s o. EST-related nucteic acids or fragments o po- 
Xnal segments of EST-related nuclei acds. Suitable 
hosts include mammalian cells, .issues or organ,*™ 
avTan cells, tissues, or organisms, insect ceils, issues 
or orqanisms. or yeast. . 
[040?] in addition, the secretion vector contains ,**r 
ngsitesforinsertinggenes encoding ^P^^ 
are to be secreted. The cloning sites facilitate the don- 
ing of the insert gene in Irame with the s.gna sequence 
such that a tusbn protein in which the signal peptide , 
fused to the protein encoded by the inserted gene is ex- 
press d from the mRNA transcribed from the promoter 
TlTe signal peptide directs the extracellular secretin of 

vector may be DMA or RNA and 
may i tegrate into the chromosome of the host be sta^ 
b!y maintained as an extrachromosomal 
host be an artificial chromosome, or be transiently 
Jim in the host. Preferably. ^ h ^*^ 
maintained in multiple copies in each host ce LAs ^used 
herein, multiple copies means at leas 2, 5, 10. 2025, 
50 or more than 50 copies per cell. In some embed- 
ments the multiple copies are maintained extrachromo- 
Tomally. In other embodiments, the multiple cop.es re- 



sult from amplification of a chromosomal sequence. 
[0403] Many nucleic acid backbones suitable for use 
as secretion vectors are known to those skilled in the 
art including retroviral vectors, SV40 vectors, Bovine 
s Papilloma Virus vectors, yeast integrating plasm.ds, 
yeast episomal plasmids. yeast artificial chromosomes, 
human artificial chromosomes, P element vectors, bac- 
ulovirus vectors, or bacterial plasmids capable of being 
transiently introduced into the host. 
io [0404] The secretion vector may also contain a polyA 
signal such that the potyA signal is located downstream 
ol the gene inserted into the secretion vector. 
[0405] After the gene encoding the protein for which 
secretion is desired is inserted into the secretion vector. 
15 the secretion vector is introduced into the host cell, tis- 
sue or organism using calcium phosphate precipitation 
DEAE-Dextran, elect roporation. liposome-mediated 
transtection. viral particles or as naked DNA. The pro- 
tein encoded by the inserted gene is then punfied or en- 
20 riched from the supernatant using conventional tech- 
niques such as ammonium sulfate precipitation, immu- 
noprecipitation, immunoaffinitychromatography, size 
exclusion chromatography, ion exchange chromatogra- 
phy and HPLC. Alternatively, the secreted protein may 
25 be in a sufficiently enriched or pure state in the super- 
natant or growth media of the host to permit it to be used 
for its intended purpose without further enrichment. 
[0406] The signal sequences may also be inserted in- 
to vectors designed for gene therapy. In such vectors, 
30 the signal sequence is operabiy linked to a promoter 
such that mRNA transcribed from the promoter encodes 
the signal peptide. A cloning site is located downstream 
of the signal sequence such that a gene encoding a pro- 
tein whose secretion is desired may readily be inserted 
35 into the vector and fused to the signal sequence. The 
vector is introduced into an appropriate host cell. The 
protein expressed from the promoter is secreted extra- 
cellularly, thereby producing a therapeutic effect. 



40 EXAMPLE 47 

Fusion Vectors 

[0407] The EST-related nucleic acids, positional seg- 
45 ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may be 
used to construct fusion vectors for the expression of 
chimeric polypeptides. The chimeric polypeptides com- 
prise a first polypeptide portion and a second polypep- 
so tide portion. In the fusion vectors of the present inven- 
tion nucleic acids encoding the first polypeptide portion 
and the second polypeptide portion are joined in frame 
with one another so as to generate a nucleic acd en- 
coding the chimeric polypeptide. The nucleic acd en- 
55 coding the chimeric polypeptide is operabiy linked to a 
promoter which directs the expression of an mRNA en- 
coding the chimeric polypeptide. The promoter may be 
in any ol the expression vectors described herein includ- 
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ing those described in Examples 20 and 46. 
[0408] Preferably, the fusion vector is maintained in 
multiple copies in each host cell. In some embodiments, 
the multiple copies are maintained extrachromosomally. 
In other embodiments, the multiple copies result from 
amplification of a chromosomal sequence. 
[0409] The first polypeptide portion may comprise any 
of the polypeptides encoded by the EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 
cleic acids. In some embodiments, the first polypeptide 
portion may be one of the EST-related polypeptides, 
fragments of EST-related polypeptides, positional seg- 
ments of EST-related polypeptides, or fragments of po- 
sitional segments of EST-related polypeptides. 
[041 0] The second polypeptide portion may comprise 
any polypeptide of interest. In some embodiments, the 
second polypeptide portion may comprise a polypeptide 
having a detectable enzymatic activity such as green flu- 
orescent protein or (J galactosidase. Chimeric polypep- 
tides in which the second polypeptide portion comprises 
a detectable polypeptide may be used to determine the 
intracellular localization of the first polypeptide portion. 
In such procedures, the fusion vector encoding the chi- 
meric polypeptide is introduced into a host cell under 
conditions which facilitate the expression of the chimeric 
polypeptide. Where appropriate, the cells are treated 
with a detection reagent which is visible under the mi- 
croscope following a catalytic reaction with the detecta- 
ble polypeptide and the cellular location of the detection 
reagent is determined. For example, if the polypeptide 
having a detectable enzymatic activity is |5 galactosi- 
dase, the cells may be treated with Xgal. Alternatively, 
where the detectable polypeptide is directly detectable 
without the addition of a detection reagent, the intracel- 
lular location of the chimeric polypeptide is determined 
by performing microscopy under conditions in which the 
dectable polypeptide is visible. For example, if the de- 
tectable polypeptide is green fluorescent protein or a 
modified version thereof, microscopy is performed by 
exposing the host cells to light having an appropriate 
wavelength to cause the green fluorescent protein or 
modified version thereof to fluoresce. 
[0411] Alternatively, the second polypeptide portion 
may comprise a polypeptide whose isolation, purifica- 
tion, or enrichment is desired. In such embodiments, the 
isolation, purification, or enrichment of the second 
polypeptide portion may be achieved by performing the 
immunoaffinity chromatography procedures described 
below using an immunoaffinity column having an anti- 
body directed against the first polypeptide portion cou- 
pled thereto. 

[0412] The proteins encoded by the EST-related nu- 
cleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-relat- 
ed nucleic acids or the EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides, or fragments of positional 



segments of EST-related polypeptides may also be 
used to generate antibodies as explained in Examples 
20 and 33 in order to identify the tissue type or cell spe- 
cies from which a sample is derived as described in Ex- 
s ample 48. 

EXAMPLE 48 

Identification of Tissue Types or Cell Species by Means 
w of Labeled Tissue Specific Antibodies 

[0413] Identification of specific tissues is accom- 
plished by the visualization of tissue specific antigens 
by means of antibody preparations according to Exam- 
's pies 20 and 33 which are conjugated, directly or indi- 
rectly to a detectable marker. Selected labeled antibody 
species bind to their specific antigen binding partner in 
tissue sections, cell suspensions, or in extracts of solu- 
ble proteins from a tissue sample to provide a pattern 
20 for qualitative or semi-qualitative interpretation. 

[041 4] Antisera lor these procedures must have a po- 
tency exceeding that of the native preparation, and for 
that reason, antibodies are concentrated to a mg/ml lev- 
el by isolation of the gamma globulin fraction, for exam- 
25 pie, by ion-exchange chromatography or by ammonium 
sulfate fractionation. Also, to provide the most specific 
antisera, unwanted antibodies, for example to common 
proteins, must be removed from the gamma globulin 
fraction, for example by means of insoluble immunoab- 
30 sorbents, before the antibodies are labeled with the 
marker. Either monoclonal or heterologous antisera is 
suitable for either procedure. 

1. tmmunohistochemical Techniques 

35 

[041 5] Purified, high-titer antibodies, prepared as de- 
scribed above, are conjugated to a detectable marker, 
as described, for example, by Fudenberg, H.. Chap. 26 
in; Basic 503 Clinical Immunology, 3 rd Ed. Lange, Los 
40 Altos, California (1980) or Rose,, et al, Chap. 12 in: 
Methods in Immunodiagnosis, 2d Ed. John Wiley and 
Sons, New York (1980). 

[0416] A fluorescent marker, either fluorescein or 
rhodamine, is preferred, but antibodies can also be la- 

45 beled with an enzyme that supports a coior producing 
reaction with a substrate, such as horseradish peroxi- 
dase. Markers can be added to tissue-bound antibody 
in a second step, as described below. Alternatively, the 
specific antitissue antibodies can be labeled with ferritin 

so or other electron dense particles, and localization of the 
ferritin coupled antigen-antibody complexes achieved 
by means of an electron microscope. In yet another ap- 
proach, the antibodies are radiolabeled, with, for 
example 125 I, and detected by overlaying the antibody 

55 treated preparation with photographic emulsion. 

[0417] Preparations to carry out the procedures can 
comprise monoclonal or polyclonal antibodies to a sin- 
gle protein or peptide identified as specific to a tissue 
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type, for example, brain tissue, or antibody preparations 
to several antigenically distinct tissue specific antigens 
can be used in panels, independently or in mixtures, as 
required. 

[0418] Tissue sections and cell suspensions are pre- s 
pared (or immunohistochemical examination according 
to common histological techniques. Multiple cryostat 
sections (about 4 fim, unfixed) of the unknown tissue 
and known control, are mounted and each slide covered 
with different dilutions of the antibody preparation. Sec- '0 
tions of known and unknown tissues should also be 
treated with preparations to provide a positive control, 
a negative control, for example, pre-immune sera, and 
a control for non-specific staining, for example, buffer. 
[0419] Treated sections are incubated in a humid '5 
chamber for 30 min at room temperature, rinsed, then 
washed in buffer for 30-45 min. Excess fluid is blotted 
away, and the marker developed. 
[0420] If the tissue specific antibody was not labeled 
in the first incubation, it can be labeled at this time in a 20 
second antibody-antibody reaction, for example, by 
adding fluorescein- or enzyme-conjugated antibody 
against the immunoglobulin class of the antiserum-pro- 
ductng species, for example, fluorescein labeled anti- 
body to mouse IgG. Such labeled sera are commercially 2s 
available. 

[0421] The antigen found in the tissues by the above 
procedure can be quantified by measuring the intensity 
of color or fluorescence on the tissue section, and cali- 
brating that signal using appropriate standards. 30 

2. Identification of Tissue Specific Soluble Proteins 

[0422] The visualization of tissue specific proteins 
and identification of unknown tissues from that proce- 3$ 
dure is carried out using the labeled antibody reagents 
and detection strategy as described for immunohisto- 
chemistry; however the sample is prepared according 
to an electrophoretic technique to distribute the proteins 
extracted from the tissue in an orderly array on the basis *»o 
of molecular weight for detection. 
[0423] A tissue sample is homogenized using a Vtrtis 
apparatus; cell suspensions are disrupted by Dounce 
homogenization or osmotic lysis, using detergents in ei- 
ther case as required to disrupt cell membranes, as is <s 
the practice in the art. Insoluble cell components such 
as nuclei, microsomes, and membrane fragments are 
removed by ultracentrifugation, and the soluble protein - 
containing fraction concentrated if necessary and re- 
served for analysis. so 
[0424] A sample of the soluble protein solution is re- 
solved into individual protein species by conventional 
SDS polyacrylamide electrophoresis as described, for 
example, by Davis.L et al., Section 19-2 in: Basic Meth- 
ods in Molecular Biology (P. Leder, ed), Elsevier, New ss 
York (1 986), using a range of amounts of polyacrylamide 
in a set of gels to resolve the entire molecular weight 
range of proteins to be detected in the sample. A size 



marker is run in parallel for purposes of estimating mo- 
lecular weights of the constituent proteins. Sample size 
for analysis is a convenient volume of from 5 to 55 u.l, 
and containing from about 1 to 100 jag protein. An aliquot 
of each ol the resolved proteins is transferred by blotting 
to a nitrocellulose filter paper, a process thatmaintains 
the pattern of resolution. Multiple copies are prepared. 
The procedure, known as Western Blot Analysis, is well 
described in Davis, L. et al, supra Section 19-3. One 
set of nitrocellulose blots is stained with Coomassie 
Blue dye to visualize the entire set of proteins for com- 
parison with the antibody bound proteins. The remaining 
nitrocellulose filters are then incubated with a solution 
of one or more specific antisera to tissue specific pro- 
teins prepared as described in Examples 20 and 33. In 
this procedure, as in procedure A above, appropriate 
positive and negative sample and reagent controls are 
run. 

[0425] In either procedure described above a detect- 
able label can be attached to the primary tissue antigen- 
primary antibody complex according to various strate- 
gies and permutations thereof. In a straightforward ap- 
proach, the primary specific antibody can be labeled; al- 
ternatively, the unlabeled complex can be bound by a 
labeled secondary anti-IgG antibody. In other approach- 
es, either the primary or secondary antibody is conju- 
gated to a biotin molecule, which can, in a subsequent 
step, bind an avidin conjugated marker. According to yet 
another strategy, enzyme labeled or radioactive protein 
A, which has the property of binding to any IgG, is bound 
in a final step to either the primary or secondary anti- 
body. 

EXAMPLE 49 

Immunohistochemical Localization of Polypeptides 

[0426] The antibodies prepared as described in Ex- 
amples 20 and 33 above may be utilized to determine 
the cellular location of a polypeptide. The polypeptide 
may be any of the polypeptides encoded by EST-related 
nucleic acids, positional segments of EST-related nu- 
cleic acids or fragments of positional segments of EST- 
related nucleic acids or the polypeptide may be one of 
the EST-related polypeptides, fragments of EST-related 
polypeptides, positional segments of EST-related 
polypeptides, or fragments of positional segments of 
EST-related polypeptides. In some embodiments, the 
polypeptide may be a chimeric polypeptide such as 
those encoded by the fusion vectors of Example 47. 
[0427] Cells expressing the polypeptide to be local- 
ized are applied to a microscope slide and fixed using 
any of the procedures typically employed in immunohis- 
tochemical localization techniques, including the meth- 
ods described in Current Protocols in Molecular Biology, 
John Wiley and Sons, Inc. 1997. Following a washing 
step, the cells are contacted with the antibody. In some 
embodiments, the antibody is conjugated to a detecta- 
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^ « H«*rribed above to facilitate detection. Al- 

me ^lypeptides encoded by EST-rela.ed nuek» acris 

tnepoiyp vy. pcT-rPlated nuc etc acids or 

acids or ^ies ^ EST J*-^ 

tid es. '^^^^"e^des'or fragments of 
segments of EST-relalee R H VpepIld es. can 

saCs. or differenced .umor .issue .hat has me.as- 

Loused in the immunoaffinify chromatography tedv 
nLes described below to isolate, purify or enrich the 
mques obshiuu EST . re)a |ednuclec acids. 

polypeptrfesencodedbytheEST a w 

segmenib u. i- pc-r-rfllated c-olvpsptides. The 

♦ «i cqt rplated nuc eic acids or nag 

fe WlypapW-. or fragments of positiona. seg- 
ments of EST-related polypeptides. 

EXAMPLE 50 



immnnoallinitv chromatography 

,04311 Antibodies prepared as described above are 
Lctioed to a support. Preferably, the antibodies are 
P VI antibodies but polyclonal an.ibod.es may 

e^Ptoyed in immunoaffinity chromatography, including 
cTSse CL-4B (Pharmacia, Piscataway, NJ) 

Sanyo, the coup.ing reagents typically used n ,m- 



munoaffinity chroma.ography. including «*"»8»" £ 
mideAftercouplingthe antibody toUte support, thesu, 
£rt is contacted with a sample which contams a targ« 
potypeptide whose isolation, purification or ennchmea 

encoded by the EST-related nucle.c adds, position* 
segments of EST-rela.ed nucleic acids or fragments . d 
S pSLa.segmen,so,EST,e.a.ednuc,e te acdsor^ 
awe. polypeptide may be one of the EST-related 
,o XepMol fragments of EST-refc.ed poW* 
postal segments of EST-related P°N*g*££ 
fragments of positional segments of EST- ataM 
ooLeptides. The targe, polypeptides may also be 
^peptides which have been .inked to the pofypep- 
» „des encoded by the EST-related nucle,c acds. poa- 
fonalTegmen.s of EST-rela.ed nucWc acids or frag- 
rntso.posi.ionalsegmen.sofEST-re te tedn U cle»ac. 

2 or thVtarge. pofypeptfc.es may be P<«P^- 
which have been linked to EST-rela.ed P°»P4>M« 
» laments of EST-rela.ed polypeptides, posrt.onal seg- 
menTs o EST-rela.ed polypeptides, or "ragmen.s of pc- 
".tonal segmentso. EST-related polypeptides using the 
fusion vectors described above. 
0433) Prefer a b,y,.he S ampleisp.acedincontactw*. 
« he support lor a sufficient amount of time and unde 
peptide to specifically bind .o .he antibody coupted 

loi^Tereafter, the support is washed with an ap- 
3 o Site wash solution to remove pofypep.kfes — 
have non-specifically adhered to the support. The wash 
solution may be any of those typically employed , m- 
^Ljchron*^^ 

um chloride butler (0.1 M lysine bas< .and W mhwrn 
35 chloride pH 8.0), Tris-hydroehloride buffer (0 05M Tns 

Triscl pHe0or9.0,0.1%TritonX-100.and0.5MNaCl) 
m 4 i ' A..er washing, the specifically bound I fcrget 
^peptide is eluted from the support using the h,gh P H 
« oTtow pH elution solutions typically employed r immu- 
noiffinSy chromatography. In particutor 
lu.ions may conlain an eluan. such as .nethanolamine, 

4S bodiments, the elution solution may also contain a de 
.ergent such as Triton X-100 or octyl-p-D-glucos.de. 
0436] The EST-related nudefc acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
™,b augments of EST-related nucleic acds mavaso 
so be used to clone sequences foca.ed upstream , of he 
5'ESTs which are capable of regulaung gene ^expres 
sion including promoler sequences, enhancer se- 
au^nceT and other upstream sequences whch influ- 
^nscn>.k»n or translation levels. Once identified 
S5 and c toned, these upstream 'sequences may 

be used in express** vectors designed » 
oression of an inserted gene in a desired spatial, tern 
^rat deve.opmen«l. or quantitative fashion. Example 
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51 describes a method tor cloning sequences upstream 
of the EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. 

2. Identification of upstream sequence s with promoting 
or regulatory activities 



EXAMPLE 51 

Use of EST-related nucleic acids, pos itional segments 
of EST-related nucleic acids or fragme nts of positional 
segments of EST-related n nrlpir acids to Clone 
Upstream Sequences f rom Genomic DNA 

[0437] Sequences derived from EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 
cleic acids may be used to isolate the promoters of the 
corresponding genes using chromosome walking tech- 
niques. In one chromosome walking technique, which 
utilizes the GenomeWalker™ kit available from Clon- 
tech, five complete genomic DNA samples are each di- 
gested with a dilferent restriction enzyme which has a 6 
base recognition site and leaves a blunt end. Following 
digestion, oligonucleotide adapters are ligated to each 
end of the resulting genomic DNA fragments. 
[0438] For each of the five genomic DNA libraries, a 
first PCR reaction is performed according to the manu- 
facturer's instructions using an outer adapter primer pro- 
vided in the kit and an outer gene specific primer. The 
gene specific primer should be selected to be specific 
for 5' EST of interest and should have a melting temper- 
ature, length, and location in the EST-related nucleic ac- 
ids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids which is consistent with its use in PCR reactions. 
Each first PCR reaction contains 5ng of genomic DNA, 
5 u.1 of 10X Tth reaction buffer, 0.2 mM of each dNTP, 
0 2 jiM each of outer adapter primer and outer gene spe- 
cific primer, 1.1 mM of Mg(OAc) 2 , and 1 pJ of the Tth 
polymerase 50X mix in a total volume of 50 ul The re- 
action cycle for the first PCR reaction is as follows: 1 
min at 94°C / 2 sec at 94°C. 3 min at 72*C (7 cycles) / 
2 sec at 94°C, 3 min at 67°C (32 cycles) / 5 min at 67°C. 
[0439] The product of the first PCR reaction is diluted 
and used as a template for a second PCR reaction ac- 
cording to the manufacturer's instructions using a pair 
of nested primers which are located internally on the am- 
plicon resulting from the first PCR reaction. For exam- 
ple, 5 nl of the reaction product of the first PCR reaction 
mixture may be diluted 180 times. Reactions are made 
in a 50 ul volume having a composition identical to that 
of the first PCR reaction except the nested primers are 
used. The first nested primer is specific for the adapter, 
and is provided with the GenomeWalker™ kit. The sec- 
ond nested primer is specific for the particular EST-re- 
lated nucleic acids, positional segments of EST-related 



nucleic acids or fragments of positional segments of 
EST-related nucleic acids for which the promoter is to 
be cloned and should have a melting temperature, 
length, and location in the EST-related nucleic acids, po- 
s sitional segments of EST-related nucleic acids or frag- 
ments of positional segments of EST-related nucleic ac- 
ids which is consistent with its use in PCR reactions. 
The reaction parameters of the second PCR reaction 
are as follows: 1 min at 94°C / 2 sec at 94°C, 3 min at 
w 72*C (6 cycles) / 2 secat 94°C, 3 min at 67°C (25 cycles) 
/ 5 min at - 67°C. The product of the second PCR reac- 
tion is purified, cloned, and sequenced using standard 
techniques. 

[0440] Alternatively, two or more human genomic 
is DNA libraries can be constructed by using two or more 
restriction enzymes. The digested genomic DNA is 
cloned into vectors which can be converted into single 
stranded, circular, or linear DNA A biotinylated oligonu- 
cleotide comprising at least 15 nucleotides from the 
20 EST-related nucleic acids, positional segments of EST- 
related nucleic acids or fragments of positional seg- 
ments of EST-related nucleic acids sequence is hybrid- 
ized to the single stranded DNA. Hybrids between the 
biotinylated oligonucleotide and the single stranded 
25 DNA containing the EST-related nucleic acids, position- 
al segments of EST-related nucleic acids or fragments 
of positional segments of EST-related nucleic acids are 
isolated as described above. Thereafter, the single 
stranded DNA containing the EST-related nucleic acids, 
30 positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids is released from the beads and converted into 
double stranded DNA using a primer specific for the 
EST-related nucleic acids, positional segments of EST- 
35 related nucleic acids or fragments of positional seg- 
ments of EST-related nucleic acids or a primer corre- 
sponding to a sequence included in the cloning vector. 
The resulting double stranded DNA is transformed into 
bacteria. cDNAs containing the EST-related nucleic ac- 
40 ids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids are identified by colony PCR or colony hybridiza- 
tion. 

[0441] Once the upstream genomic sequences have 
45 been cloned and sequenced as described above, pro- 
spective promoters and transcription start sites within 
the upstream sequences may be identified by compar- 
ing the sequences upstream of the EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
so or fragments of positional segments of EST-related nu- 
cleic acids with databases containing known transcrip- 
tion start sites, transcription factor binding sites, or pro- 
moter sequences. 

[0442] In addition, promoters in the upstream se- 
55 quences may be identified using promoter reporter vec- 
tors as described in Example 53. 
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EXAMPLE S3 

Rftg uences 



factor binding sites within the promoter individually or n 
combination 9 The effects ot these mutattons on tra* 
script™ levels may be determined by .nserttng the nx- 
Sons into the cloning sites in me promote, report* 

vectors. 



mentsof EST relate™ acids are lfV 

wmm 

or linker scanning to obliterate p« 



EXAMPLE 54 

pinning and I rlrn^^™ "* Promoters 

di iuo /qfq I D NO. 1 7) was obtained. 

^ g fh. prime, pairs GTA CCAGGGG ACT 
PTG ACC ATT GC (SEQ ID NO:1 8) and CTG TG A CCA 

or having the internal designation P15B4 (SEQ 10 Ntt 

20) was obtained ^rriTGaAAG 
,04481 Using the primer pairs CTG GGA TGG AAfc 
GCA CGG TA (SEQ ID NO:21) and GAG ACC ACA 
* CAG CTA GAC AA (SEQ ID NO:22). the promoter hav- 
^? h eWemaldesiUi°nP29BG(SEQID N 0:23)was 

SSTfIbu™ 4 provides a schematic description << 

^ with the corresponding 5' tags. The upstream sequenc 
« were screened lor the presence ot motifs resemb ng 
rans^ lac.or binding sites or Known transcnp ion 
suites using .he compu.e, program Matlnspector re- 

3S 'S^fgrVdfscribe, the ascription facto, 
Snding si.es presen. in each ot these promo ton.. The 
columns labeled matrice provides the name ot me Mat- 
^s B r.or matrix used. The column labeled position pro- 
Twe he S^position o. the promoter site. Numeration ol 
40 me sequence starts from the transcription stte as deter- 
mined by matching the genomic sequence w.m , he 5 
EST sequence. The column labeled -orientate ,ndi- 
c^Is me DNA strand on which the s«e is found, wnh 
Z l suand being the coding strand as determined by 
« Etching "he genomic sequence with the sequence , of 
^ 5 EST. The column labeled -score" provides the 
inspector score .ound .or this site. The - column ■ 
'engm- provides .he length o. the sfe ,n nuc e- 
oUdes. The column labeled -sequence" proves the se- 
so quenceol the site found. rontain- 
I04S11 Bacterial clones containing plasmtds contain 
no the promoter sequences described above descnbed 
Xe a^esentry stored in me invents laboratories 

unTer the internal identification numbers provided 
5S abov The inserts may be recovered from .he depos *- 
^ate"als by growing an aliquot of me appropriate 
^"a te inme appropriate 
DNA can then be isolated using plasmid isolation pro- 
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flitit 

the art. Alternatively, a PCR can ww k 
designed at both ends 
oleic acids, positrcnal segments ot fcb ■ 

Eclated nucleic acids or fragments o. positional 
EST reiateo |c acids can me n be ma- 

es located upstream of the EST-reiaiea nu 
es Kx,t«e« r related nuc eic acids or 

===== 

lated nucleic acids, positional segments of ES I reiaieo 

ra4S31 Preferably, the desired promoter is placed near 
tor ab i e to drive expression ol the mseneo 

^) or Bovine Papilla Vfi.us. bacKbones from bac 
as described in Example 55 below. 



EXAMPLE SS 



■^.^Hnn n. P.otnirr ™ lh p ' omolg 

^ ...n^s Ltostre — °-. T .i=.tn^Renuences. or 

mRNA 



f04561 Sequences within the promoter region wind, 
are likely to bind transcription factors may be identified 
by homology to known transcription factor binding s,.es 
.0 o mrghconvent^almutegenesisordeletonana^ 
ses ol reporter plasmkfs containing the promoter se- 
quence. For example, deletions may be made ,n a re- 
porter plasmid containing the promoter sequence ol uv 
Lest operably linked to an assayable reporter gene. 
,5 Z reporter plsmids carding various dele.ions with* 
me promoter region are txansfected intoan appropnae 
host cel. and the effects of the deletions on express on 
levels is assessed. Transcription factor taring sues 
within the regions in which deletions reduce expression 
20 Zol may be further localized using site directed mu- 
lageneTis pinker scanning analysis, or other techmques 
familiar to those skilled in the art. 
[045^ Nucleic acids encoding proteins which interact 
with sequences in the promoter may be identified using 
2S one-hybrid systems such as those described ,n the man- 

kit available from Clontech (Catalog No. K1603-1). 
Briefly, the Matchmaker One-hybrid system ,s used as 
To ows The target sequence tor which ,1 .s desired to 

ble reporter gene and integrated into the yeast genom* 
Preferably, multiple copies of the target sequences a,e 
inserted into the reporter plasmid in tandem. A brary 
comprised ot fustons between cDNAs to be evaluated 
as oil ability tobind to the promoterand the ***** 
domain ol a yeast transcription factor, such as GALA b 
transformed intotheyeast strain containing the integrat- 
reporter sequence. Theyeast are pta edonse.ec We 
media to select cells expressing the selec tab e marker 
<o Tnked to the promoter sequence. The colonies wh ch 
n"ow on the selective media contain genes encoding 
'poteinswhich bind the target sequence. The Mnsertsin 
Lgenesenccding the fusion profess are furtherchar^ 

acterized by sequencing. In addition, the .nserts may be 

vectors. Bindhgo. the polypeptides encoded ; by the in 
serts to the promoter DNA may be confirmed by tech 
nkjfes famiL to those skilled in the art. such as gel 
shift analysis or DNAse protection analysis. 

VII Use of EST-related nucleic acids, positional 
Segments of EST-re.atednuc.eic acids orfregments 
ofpositionaleegmentsolEST-relatednucleicac.de 

In Gene Therapy 

[0456] The present invention also comprises the use 
of EST-related nucleic acids, positional segments o 
EST.ela.ed nucleic acids or fragments o. positional 
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««m«in of EST-related nucleic acids in gene therapy 

Ssss==ss=s=E 

SrC—. «■»* * ° ""SEES 

acid A be incorporated in a ribozyme capable ol spe 
cilically cleaving the target mRNA. 

EXAMPLE 56 

■ T^mn and Use of Anti sen^ niig nnucleotides 



rnassi The antisense nucleic acid molecules to be 
'u s e d n ganethlrapy may be either DNA or RNA se 
nuances They may comprise a sequence 

ob,Jed.romanuc,eo.k.ese q uenceen^«gapo«e,n 
by reversing the orientate ot me cod,ng reg»nwahje 
soect to a promoter so as to transcr.be the opposto 
suandf°OTthat which isnormallytranscribedmthece.L 
The ant sense molecules may be transcribed using m 
transcription systems such as those wh.cn emptoy 
T7 or SP6 polymerase to generate the transcr pt. An- 
olhe approach involves transcripts of the antisense 
nucleS invivo by operably linking DNAconta.nng 

rotsT] Alternatively, oligonucleotides 
Sementary to the strand normally transcribed in the ce« 

adds are complementary to the corresponding mRNA 
and are cTpaWe ot hybridizing to the mRNA to create a 
duplex "some embodiments, the antisense , sequenc- 
esmay contain modified sugar phosphate backbones 
fo increase stability and make them less sens^ve to 
RNase activity. Examples of modifications suitable to, 



use in antisense strategies are described by Ross, el 
at. nmnmo* fher. 50(2) 245-254. (1991). 
[04621 Various types ot antisense oligonucleotides 
complimentary to the sequence of the EST-related nu- 

s positional segments o, EST-refcted nu, lac 

acids or fragments ol positional segments of EST-£at 
ed nucleic acids may be used. In one preferred embod- 
Lnt. stable and scramble antisense ol.gonuc^ 
otides described in International Application Na PCT 

to W094/23026 are used. In these molecules, the 3 end 
Xm.he3'and5-endsareengagedin— cuter 

hydrogen bonding between complementary base pan. 
T^es^moleculesarebenerabletowitJistandexonuc.^ 
ase attacks and exhibit increased stability compared to 
,5 conventional antisense oligonucleotides. 

[04631 In another preferred embodiment, the anti- 
sense oligodeoxynucleotides against herpes s.mplexv,- 

rus types 1 and! described in International Application 
No WO 95/04141 are used. 
so [0464] In yet another preferred embodiment, the cov- 
Lent* cross-linked antisense . oKg-ucleotrfes «£ 
scribed in International Application No. WO 96/3 523 
are used. These double- or single-stranded oligonuc e- 
otidescompriseoneormore, respectfcely, ,nter-or ««- 
2S oligonucleotide covalent cross-linkages, 

tinLge consists of an amide bond between a primary 
arnine group of one strand and a cartooxyl group of the 
oZsfrand^rofthesame strand, respect^the^ 
mary amine group bein9direc^subs.i.uted in «he 2 p^ 
30 sKionolthestrandnucteotidemonosacchandering.and 
carboxyl group being carried by an al,phat,c spacer 
group sTtKuted on a nucleotide or nucleot.de analog 
of the other strand or the same strand, respectively. 
,04651 The antisense oligodeoxynucleotides and oli- 
35 l onnucleotidesdisclosedinlntemationa.Appcal,onNO a 
WO 92/18522 may also be used. These molecules are 
stable to degradation and contain at least one transcnp- 

proteins and are effective as decoys therefor. These 
« molecules may contain "hairpin- structures, dumbbell 
structures, -modified dumbbell" structures, cross- 
linked- decoy structures and -loop" structures. 
0466] in another preferred embodiment, the cydc 
double-stranded oligonucleotides described in Europe- 
« 7n Patent Application No. 0 572 287 A2. These ligated 
o gonucleotl -dumbbells" contain the binding site <o 
a transcription factor and inhibit expression ol the gene 
under control of the transcriptton factor by sequestering 

50 KSTi*. of .he closed antisense - -9™*^ 
disclosed in International Application No. WO 92/19732 
2 also contemplated. Because these molecufcs have 
no free ends, they are more resistant to degradation by 
exonucleases than are conventional oligonucleot.des. 
55 Theseoligonucleo.idesmaybemul.i.unct^aUnlerac. e 
ing with several regions which are not ad|acent to the 
tarqet mRNA. 

t 0468] The appropriate level ot antisense nucle.c ac- 
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ids ,equiredto h*. •^SS'SSS 

mi ned using /n ^^'^^Tells by dilfusion. 
molecule rray ^ntrod r ^; s . mg p ' r<Kedu(e s 
injection, infection or ira , se nuc leic ac- 

known in the a bare or naked 

ids can bo introduced into the boaya ( 

°' 35 The express** vec 

SrrS- may ^-"^ onto 

[0469] ™eaf r ^^ 

cell samples at a ^^ num °^ Mto , xn0 -4M.Oncelhern,n- 
prelerably between 1x10 mk ' , rol g ene 

Snum concentration that can ^ a ^ eistrans , a ,ed 
expressioniskJenWied theoptrnrzed an 

into a dosage suitable lor use n wes in . 

to a dose ol approximately 0.6 «W* » 

and ^^ h W J!^SSS-ith. antisense ol- 

[04701 It ' s,urther ^ s m ^ ated into a ribozyme 
igonucleotidesequencejsincorpo ^.^ ^ 

sequence to Technical applications 

po.ypept.de encoded by me 9 e nhibjlion on lran s.a- 
that the effectiveness ol ^® 8n ' h , jnclude but 

are no. ™f o -dolabellng. 
and EU ^t"S acids, positional seg- 
104721 ,«T related nudL acids or fragments o. fo- 
ments of EST-re'ate £ nudeic acids may also 
sitional segments of ES I ™* d on in tracel- 
be us ed in gene therapv "^^jgonud^ 
iularlriple helix formation * Tney 
are used to in cell ac- 
are particularly uselul lor stur ^ ES T- 
tivtty as it isassociated ^*f*££* of EST-relat- 
re Jted nucleic acids, f^Jg^Z segments of 
eo nuclei acidso, fragment of po^ .^.^ m 

EST-relaled nucle.c acids o hep ^ be 

more prelerably. a ^ rt » n ° ' h ~ nd ^ dua ,s having dis- 
ced, o inhibit gene express^ g(jne 

eases -^^^.S^po-Uon." seg- 
Similarly acids o, fragments of fo- 

ments ol EST -re at* ° " , eic acids can be 



*hio a cell Traditionally, homopurim 
particular gene me most useful fo, trip* 

sequences «™ ««* Xmopyrimidine sequence, 
helix strategies. However, nom « homopyrimidin* 

5 oligonucleotides bind to the map y tf 

sequences rom thB EST « ^ ^ (fagmente 

al segments of EST ^ ela eo ' acids m 



EXAMPLE 57 

P ie£ ar*ionar^^ 

,s ^ «f th« PST-related nucleic ac- 

ids . positional segmen sol Eb a(ednucleje 
fragments of positionalsegmentso, r 
acids are scanned.o ,den. y l0-meno2 ^ ^ 

in ,riple-hel« ^^.^Son of candidate homopy- 
presstav Following ^f^™*^ efficiency in in- 
nmidine or ^^"^f'^sedby introducing var- 
hHIng gene express.cn » •^^ Wng lne candi- 
2S ying amounts of *g££^ u J* normal., 
date sequences "^^"^oBuelealld.. may be 

GENSET, Paris, France. introduced into 

10474] The '^^^^ iam .0 those 
the cells using a var e* of to calcium 

skilled in .he art. .nc.ud.ngl ut noU.m 

M phospha,e P"^-,,^^"^-.-.^ 

tion.liposome-mediaed trans ce)| 

[0475] Treated ce lis ^ techniques 
Ion ction or reduced gene £ as says, or 

such as Northern **, , rans c,iption levels 

0 , the targe, gene in erf. t* {q ^ m0( , t0(ed 

me oligonucleotide^The cell lun ma (arge( 

are predicted bB^^££5d n«Wc acids, 
genes corresponding to met^ ^ o , 

« positional segments of BT« , ated nuc | eic 

tragmentsol positional segments ol tb 

Js from which ^«|^^^«-'- Ih 
known gene ^^sequences ...^ 

so dieted based on me presence o rticular in . 

within cefls derived ^"f^Z^n*!** 

techniques described herein. jn jn _ 
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above and in Exa m ^Kat^d 9 n 5g 

on the in w.ro results ^.^ t " lne natural (beta) ano- 
[0477] in some ^^" ,s . be „ laced with 

me rs ot me ^9° nuC '^ n Nucleotide more re- 
alph aanome,s to render the ^ agent 

sistant to nucleases. Further « bB attached 

such as ethidium ,o stable the 

to me 3' end ol generation o. oigo- 



EXAMPLE 58 




tv. c<st related nucleic acids, positional seg- 
[0478] The EST-relatea n f(agme nts ol po- 

Inen.so.EST.re.a^dnuc 

sitionalsegmentsol EST££ed ^ polypepUde 

be used to express £ eflecl ln a0 . 
in ahosl organism to produce d ^ pep . 

ditto n, nucleR: ac«s polypeptides 
tides, positional segments ol to ES T-related 

o, iragments o. t°^J%^£**a)mir>+ 

teln or polypeptide in a host org 

eficialeltect. r , ures tne encoded protein or 
,0479] In ^P'^lUressed-.thehostor- 
poiypeptiden^ybeuans^y p ^ ^ 

ganism or stably jessed i in ^ ^ ^ 
encoded protein or P or,pept.de may ha^^ y^ ^ 

tivif.es described *ov«; ' " ^^pUde which the 
polypeptide may be eneod ed pro- 

tein may augment me e»s. a 
host organism. , in wn ich the protein or 

[0480] in some acids encoding the lull 

polypeptide is pe p .«e and the mature 

,ength protein i.e me ma , u re pro- 

protein), o, ^ the signal peptide 

tein (i.e. the protein 9*™'^ nos , orgar , te m. 
is cleaved om is introduced . nto <n^ g ^ 

10481! The ^^ in ^VX?nehos,or g anism 
po ly pept,des may be int odu ^ q| bU| ,„ 

mereby producing a benelicia ^ ^ 
[0482] Allematively. the nucleic a 
protein or polypeptide ^,7^^^ is act.e inthe 



lor ma y be direct* n» ^ ^ ^ 

sucn that the encoder^ | p»*n» P^ ^ ^ 

° r9an L S me 0 ex^n Sector may be Wodueed H. 
s proach. ^^ortainhg , he expression vector are 

70 

EXAMPLE 59 

UseotSiorjalPexii^^ 

importapepWeorap^no. merest ^ 
into tissue culture cells (Lin el a/., J. 
» 14225-1«se(l9B5);Du--.. AW 1ft 
235-243 (1998); Bo|as et a/., Narur 

SS" Ccel. permeable 

25 cated across cell " ie ™ ,an °' . to eitner the C-ter- 
be used in order to add the h «9» < 

Alternatively, when longe pep desor p ^ 
sported into cells. ^ c a ^Xto those SKiUed In 
oo gheered, us.g techmq es am* 

the art, in order to link Jta > e» q( g DNA 

sequence coding tor a cargo po w v ejther 

using conventional techn ques to p ^ 

ce „ permeable P"^J^ mw *..po 1 yp.p- 

men simply incubated * ' > «i membrane . 
tide which is then w™*^ , 0 study diverse 

„ 10485] ™^™^^«^ For* 
intracellular ^^f^M* relevant 
stance, it has been useo to p. axam i ne protein- 

domainsotin.race^ 

p.oteininteract^ns^vorved. sign ^ ^ 
45 ways (Un 91 at supra._Ui « chen , p 27 1; 
5305-530B (1996), ft*. * ■ * ■ ^ » „ ^ Scf . 

duced into the host organism. peptides ot 

55 P487] AM^^^tcttSon^ 

rrer=s^ 
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oUgonucleotides or oligonucleotkies designed to fonn 
triole helixes as describedabove, in order to inh,b,t 
Xes^ -aturatton of a target cellar RNA. 

EXAMPLE 60 



Computer F mbodiments 

[0488] As used herein the term 'nucleic acid codes of 
SECHD NOs 24-4100 and 8178-36681- encompasses 
f he n cleot^e sequences ol SEQ ID NOs: 24-4100 and 
8178 £>6B1, fragments ot SEO ID NOs: 24-4100 and 
8178-36681, nucleotide sequences homologous to 
SEQ ID NOS" 24-4100 and 8178-36681 or homologous 
f Ta ^ i of SEQ ID NOs: 24-4100 and 8178-366^ 
and sequences complementary to all of the preced.ng 
sequences. The fragments include port^s of SEQ ID 
NOs- 24-4100 and 8178-36681 compnsing at leasl .10. 
15 20 25 30 35 40. 50, 75, 100, 150, 200, 300, 400. 

and 6178-36681. Preferably, the fragments are nove 
fragments. Homo.ogous sequences and fragments of 
SEQ ID NOs- 24-4100 and 8178-36681 refer to a se- 
ance ha^ng at least 99%. 98%. 97%, « 
90% 85%, 80%, or 75% homology to these sequences. 
Homology may be determined using any of the compu- 
ter programs and parameters described in Example ^ 
including BLAST2N with the default parameters o wrth 
any modified parameters. Homologous sequences a so 
include RNA sequences in wh*h undines _«pte« £e 
thymines in the nucleic acid codes ol SEQ ID NC* 
24.^100 and 8178-36681 . The homologous sequences 
m a be obtained using any of the procedures described 
herein ormayresultfrom the correction ofaseqenc^ 

error as described above. It will be applied that the 

nucleic acid codes of SEQ ID NOs: ^ 

8178-36681 can be represented in the traditional rogje 

character format (See the inside back cover of Starrier, 

Lubert. Biochemistry, 3* ed*ion. W. H 
NewYork.)orinanyotherformatwh 1 chrecordsthe.den 

tity of the nucleotides in a sequence. 
0489] As used herein the term -polypeptide codes of 
SEQ D NOs- 4101-8177- encompasses the polypep- 
? W e8 quence of SEQ ID NOs: 4101 -8177 which .are en- 
coded by the 5' EST s of SEQ ID NOs: 24-4100 and 
81 78-36581 polypeptide sequences homologous to the 
pS^Q 'D NOs: 4101-8177, or fragments 
d any of the preceding sequences. Homologous 
po,ypeVidesequencesrefertoapd^ 

having at least 99%, 98%. 97%, 96%. 95%. 90%, 85%. 
80%. 75% hc^ologytoone of the polypept.de seque c- 

es of SEQ ID NOs: 4101-8177. Homology may be de- 
termined using any of the computer programs and pa- 
rameters described herein, includmg FASTA with the 
deTault parameters or with any modified parameters^ 
^ehomo.ogoussequencesmaybeobta,nedu^ng^ 

of the procedures described here.n " W ™ 
Lcorrectionofasequenc^gerrorasdescnbedabove. 



The polypeptide fragments comprise at least 5, 10. 15. 
20 25 30 35 40 50.75. 100. or 150 consecutive amino 
acids of the polypeptides of SEQ ID NOs: 4101-8177. 
Preferably, the fragments are novel fragments. It will be 
5 appreciated that the polypept.de codes ol the SEQ ID 
NOs- 4101-8177 can be represented in the traditional 
single character format or three letter format (See the 
inside back cover of Starrier, Lubert. Biochemistry ,3* 
edition W H Freeman & Co., New York.) or in any other 
10 format which relates the identity of the polypepfdes .n 
a sequence. 

[04901 It will be appreciated by those skilled in the art 
that the nucleic acid codes of SEQ ID NOs: 24-41 0C > and 
8178-36681 and polypeptide codes of SEQ ID NOs^ 
15 4101-8177 can be stored, recorded, and manipulated 
on any medium which can be read and accessed by a 
computer As used herein, the words "recorded and 
'stored' refer to a process for storing information on a 
computer medium. A skilled artisan can readily adopt 
20 any of the presently known methods for recording infor- 
mation on a computer readable medium to generate 
manufactures comprising one or more of the nucleic ac- 
id codes ol SEQ ID NOs: 24-4100 and 8 178- 36681, one 
or more of the polypeptide codes of SEQ ID NOs: 
2 s 4101-8177. Another aspect of the present invention is 
a computer readable medium having recorded thereon 
atleast2 5 10. 15, 20, 25, 30, or 50 nucleic acid codes 
of SEQ ID NOs: 24-4100 and 8178-36681 . Another as- 
pect of the present invention is a computer readable me- 
30 dium having recorded thereon at least 2, 5. 10, 15 20, 
25, 30, or 50 polypeptide codes of SEQ ID NOs. 
4101-8177. 

f0491] Computer readable media include magnetical- 
ly readable media, optically readable media, electroni- 
cs cally readable media and magnetic/optical media For 
example, the computer readable media may be a hard 
disc, a floppy disc, a magnetic tape, CD-ROM, DVD 
RAM, or ROM as well as other types of other media 
known to those skilled in the art. 
40 [0492] Embodiments of the present invention include 
systems, particularly computer systems which contain 
the sequence information described herein. As used 
herein, 'a computer system" refers to the hardware 
components, software components, and data storage 
as components used to analyze the nucleotide sequences 
0 f thenucleic acid codes of SEQ ID NOs: 24-4100 and 
8178-36681, or the amino acid sequences of the 
polypeptide codes of SEQ ID NOs: 4101-8177. The 
computer system preferably includes the computer 
50 readable media described above, and a processor for 
accessing and manipulating the sequence data. 
[04931 Preferably, the computer is a general purpose 
system that comprises a central processing unit (CPU), 
one or more data storage components for storing data, 
55 and one or more data retrieving devices for retrieving 
the data stored on the data storage components. A 
skilled artisan can readily appreciate that any one of the 
currently available computer systems are suitable. 
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r0 494] in one particular embodiment, the computer 

£222=5= 
=£=3=5225 

and modeting tools etc.) may res.de >n mammem- 
KfTSS Ibodiments, , h e 

inq the above-described nucleic add codes of SEQ i ID 
NOs 24-4100 and 8178-36681 or polypeptide codes ol 
SMIDNOs:4101-8177storedonacomputer readable 

s=J~===5r52= 

reansFofexampe,.he sequence comparermay com- 

rSS=55S==: 

ior use in this aspect ol the invention. 
04^ Accordingly, one aspect o. the present mven- 

nf SEQ ID NOs: 24-4100 and 8178-36681 or a 
code of SEQ UJ ^ 4101-8177, a data 

,o be compared to the nucle>c acd codeol SEQ ID N 
24-4100 and 8178-36681 or polypept.de code ol SEQ 



ID NOs- 4101-8177 and a sequence comparer lor con- 
duces the comparisco. The sequence comparer may 
S a hcJoff leva! between the sequences conv 
oa ed or identity structural motifs in the above descnbed 
s Tudeic 0 acid code o, SEQ ID NOs 24^00 a£ 
8178-36681 and polypeptide codes ol SEQ D NOs. 
4101-8177 or it may identity structural mollis in se- 
quences which are compared to these nucleic acri 
cooes and polypeptide codes. In some embodiments, 
,o ^ data storage device may have stored thereon *te 
sequences of at least 2, 5, 10, 15. 20, 25, 30, or 50 of 

nucleic acid codes of SEC ,ID 
8178-366B1 or polypeptide codes of SEQ ID NOs. 

,5 ro497] S1 Another aspect of the present invention is a 

20 a^he reference nucleotide sequence through the use 
ouromputerprogramwhichdetermineshornologylev- 

e, s and determining homology between the nucleic ac-d 
c*e and the reference nucleotide sequence with the 
computer program. The computer program may be any 
?5 of T number of computer programs tor determining ho- 
Itogy levels, including those specifically enumerated 
^ including BLAST2N with the delaul. parameters 
or w* any modified parameters. The method may be 
demented using the computer systems described 
,0 abole 2 method may also be performed by reading 
2 5 10 15 20. 25, 30, or 50 of the above described 
nucleoid codes o, SEQ ID NOs: W<*"« 
8,78-36681 through use ol the computer program ^nd 
determining homology between the nucleic acd codes 
is and reference nucleotide sequences . 

[0498] Alternatively, the computer program may be a 
computer program which compares the nucleotide , se- 
ances of the nucleic acid codes of the present inven- 
Z, to reference nucleotide "^"Jj^ 
« termine whether me nucleic acd code ol SEQ ,D NOs 
24-4100 and 8178-36681 differs from a reference nu 
cleic acid sequence a. one or more positions. Op tonal* 

nucleotide sequences of the nucleic acid codes ot SEQ 
so dX 24-4100 and 8178-36681 contain a s«gle .nu- 
cleotide polymorphism (SNP) with respec to a refer 
ence nucleotide sequence. This single nucleotide poly- 
nTphism may comprise a singie base subsmuuon. ,n- 
sertion, or deletion, 
ss (04991 Another aspect of the present invention is a 
ffi for determining the level of homology between 
a polypeptide code of SEO ID NOs: *^Tf<«i» 
re^rence polypeptide sequence, comprising the steps 
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nuance usinq the computer program, 
mm Accordingly, another aspect o. the present .n- 
S * a method .or determining 

nf <5EQ ID NOs: 24-4100 and 8178-36681 an 

SSSTln other embodiments the computer based 

Lures within the nucleotide sequences 

•.. „h« of =560 ID NOS. 24-4100 and 8178-366B1 
T*Z£ acfdCences o, the polypeptide codes 

^identifies certain .ea.ures within the above^- 

=£±«^"=£ 

SI 3-dimensional structure o. the P*P*I**« 

?<£es of SEQ ID NOs: 4101-8177. In some embodi- 
ments, me molecular modeling prog^ 
Sequences that are most compatible with profile ^pre- 
senting *e structure, environments o. the residues in 

mens^a. structures ol proteins in a grven familyare 
<o **» *• structurally conserved re- 



g„ Srinivasan. e. al.. U.S. Paten. No. 5 557 535 issued 
September 17, 1996). Conventional horrrokw mode- 
ling techniques have been used routMy to build mod- 
els ol proteases and antibodies. (Sowdhamin, et al 
s Protein Eng*eering 10:207, 215 (1997)). 

approaches can also be used to develop hree^men- 
stonal protein models when the protein ol interest has 
poor sequence identity to template proteins. In some 
cases, proteins told into similar three-dimensional struc- 
,o tures despite having very weak sequence identities. Fo 
example, the three-dimensional structures ol a number 
ol helical cytokines told in similar three-dimensional to- 
pology in spite ol weak sequence homology. 
TOS041 The recent development ot threading methods 
, S now enables the identification o. likely loldi ng patterns 
in a number ol situations where the slructural related- 
ness between target and templates) is r« 
at the sequence level. Hybrid methods, ,n which told rec- 
ognition is performed using Multiple Sequence Thread- 
20 ing (MST), structural equivalencies are deduced Irom 
thehrJingoutputusingadistancegeome^program 

DRAGON toconstructalowresolution model, andafull- 
atom representation is constructed using a molecular 
modeling package such as QUANTA. . 
» [0505] According to this 3-s.ep approach, candidate 
emplates are first identified by using the novel (old rec- 
ognition algorithm MST, which is capable ol performing 
simultaneous threading of multiple aligned sequences 
onto one or more 3-D structures. In a second step the 
» sTructural equivalencies obtained from the MST output 
are converted into interresidue distance restraints and 
fed into the distance geometry program DRAGON, to- 
gether with auxiliary inlormation obtained from second- 
ary structure predictions. The program combines the re- 
3S Jaints in an unbiased manner • 
,a,ge number of low resolution model confirmations. In 
a third step, these low resolution model confirmations 
arec^verfedintofull-atommodelsandsubjectedtoen. 

ergy minimization using the molecular imMngP^ 
« age QUANTA. (See e.g.. Aszodi e. 

ture. Function, and Genetics, Supplement 1.38-42 

10506? The results ol the molecular modeling analysis 
U men be used in rational drug design techniques ; to 
4B identify agents which modulate the actrvrty of the 
polypeptide codes of SEQ ID NOs. 4101-8177. 
foSOT] Accordingly, another aspect of the present in- 
tention is a method of identifying a feature wrthin the 
nucleic acid codes of SEQ ID NOs 24-4100 and 
50 8178-36681 or the polypeptide codes of SEO I ID iNOs 
4101-8177 comprising reading the nucleic acid code(s) 
or the polypeptide code(s) through the use of a compu- 
te, proVam which identifies features therein and den- 
ying features within the nucleic add code s) or 
55 polypeptide code(s) with the computer PW»£^»" 
embodiment, computer program comprises a compu Mr 
program v*ich identifies open reading frames. In a ur- 
embodiment, the computer program ident.fies 
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structural moti.s In a po^ptide 
embodiment, the computer ^™ £- 
,ecular modeling program. The ^ «*"»* 0 
termed by reading a single sequence or at least 2 5 1ft 
,V?0 25 30 or SO of the nucleic acid codes of SEQ 
DNOS 24^<X> and 8178-36681 or the polypeptide 
codes ot SEQ ID NOs: 4101-8177 through the use o. 
ht computer program and identifying lea.ures w«h,n 
,he n^eic acW codes or potypep.kJe codes with the 
computer program. NOs - 
r 0 5081 The nucleic acid codes ol SEQ ID NOs. 
So and 8178-36631 or the pofypept.de codes o. 
SEG ID NOs- 4101 -8177 may be stored and manipula - 
lu W Ha , fl orocessor programs in a variety 
ed in a variety of data process 

WORDPERFECT or as an ASCII We in a vanety ol da- 
Sepfogams tamiliar to those o. skill in the art, such 

S els o. SEQ .0 NOs: 
9 4 01 -81 77 ™ >o.K nst is Wended no. to limit the 
fnvento but to provide guidance to programs and da- 
.Tases which are useful with the nuclei acri codes ot 
SEQ ,D t^s: 24-4100 and 8178-36681 or the polypep- 

L e codes o. SEQ ID NOs: 4101-8177. The program 
and Cabases whfch may be used include^ are no 
limited to: MacPattern (EMBL), DiscoveryBase (Molec 
IrToplications Group). GeneMine (Molecular Appl - 
a, 2 G^up). Look (Molecular 
MacLook (Molecular Applications Group ). BLAST and 
BLAST2 (NCBI). BLASTN and BLASTX (Altschul e al, 

SHAPE (Molecular Stations Inc.). Ceriuf DBAc- 
cess (Mo ecular Simulations Inc.). HypoGen , (Molecular 
ImulaLs inc.). Insight II, (f^fSS 
inc 1 Discover (Molecular Simulations Inc.). CHARMm 
Slecu" Simulations Inc.). Felix (Molecu ar Simu a- 
ions inc.), DelPhi, (Molecular Simulations J*£ 
OuanteMM (Molecular Simulations Inc.), Homology 
JEST Simuto.ions Inc.). Modeler (Molecular Sjr*- 
atois inc ), ISIS (Molecular Simulations Inc ). Quanta^ 
P Sein Design (Molecular Simulations Inc.) WebLab 
Secular S mulations Inc.). WebLab Diversity Explo - 
^Molecular Simulations Inc.). Gene Explorer (Molec- 
6 u anions inc.), SeqFold <™^*£££ 
,nc 1 the EMBL/Swissprotein database, the MDL Avail 
able Chemicals Directory database, the MDL Drug Data 

rs^dalabase. Derwents's World Drug Index database, 



the BtoByteMasterFile database, the Genbank data- 
base, anc Ithe Genseon database. Many other programs 
and data bases would be apparent to one of skill in the 
art given the present disclosure. 
5 [05091 Motifs which may be detected using the above 
programs include sequences encoding leucine zippers, 
helix-turn-helix motils, gVcosylation sites, ubiquitioation 
sites alpha helices, and beta sheets, signal sequences 
encoding signal peptides which direct the secretion ol 
,o the encoded proteins, sequences implicated in Iran 
scriptbn regulation such as homeoboxes, acicte 
suetthes. enzymatic active sites, substrate bind™, 
sites, and enzymatic cleavage sites. 



15 EXAMPLE 61 

Mnfh-'" "< Mal<in 1 Nucleic Acids 
105101 The present invention also compr ises methods 
20 o. making the EST-related nucleic acids. ««JI™nt. - 
EST-related nucleic acids, positional segments o, the 
EST-related nucleic acids, or Iragments ol positional 
segments of the EST-related nucleic acids. The meth- 
ods comprise sequentially linking together nuclides 
2B to produce the nucleic acids having the preceding se- 
quences. A variety of methods of synthesizing nucleic 
acids are known to those skilled in the art. 
[05111 in many of these methods, synthesis .s coo- 
ducted on a solid support. These included the 3' phos- 
30 phoramidite methods in which the 3 terminal base of 
L desired oligonucleotide is immobilized 
uble carrier. The nucleotide base to be added ,s b ocked 
at the 5' hydroxyl and activated at the 3' hydroxy, so as 
,o cause coupling with the immobilized nucleotide base. 
* Deblocking ol the new immobilized nucleoids ^com- 
pound and repetition o. the cycle will produce the de- 
sired polynucleotide. Alternath/ely. P"*™***?^ 
be prepared as described in U.S. Patent No. 5,049.656. 
in some embodiments, several polynucleotides pre- 
40 pared as described above are ligated toge^er to gen- 
erate longer polynucleotides having a desired se- 
quence. 

EXAMPLE 62 

45 

M^hnrig nf Making Polypeptides 
r0512l Thepresent invention also comprises methods 
of making the polynucleotides encoded by EST-related 
so nudeic acids, fragments of EST-refcted nude, ^ac.ds 
positional segments of the EST-related nucle.c acds. or 

c , e * acids and methods of making the EST-reto«d 
polypeptides, fragments of EST-related polypept.des 
55 ^osLal segments of EST-related P°^P^^ 
iragments of EST-related polypept.des. The methods 
comprise sequentially linking together am.no acKte to 
produce the nucleic polypeptides having the preced.ng 
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stances. In some embodiments, the P*P*£«* 
made by these methods are 150 am^o acri or less in 
tenglh. in other embodiments, the polypeptides made 
by how methods are 120 am™ acids or less in lengthy 
105131 A variety ol methods ol making polypeptides 
are known to those skilled in the art, including methods 
in which the carboxyl terminal amino acid is bound to 
benzene or another suitable resin. The ^ amino 
kcid to be added possesses blocking groups on its am, 
no moiety and any side chain reactive groups so that 
only its carboxyl moiety can react. The carboxy . group 
^activated with carbodiimide or another activating 
aaent and allowed to couple to the immobilized ammo 
acid. After removal of the blocking group, the cycle is 
repeated to generate a polypeptide *a™g the des^ ed 
sequence. Alternatively, the methods described in U.S. 
Patent No. 5.049.656 may be used. 
msMI As discussed above, the EST-related nucleic 
acids, fragments of the EST-related nucleic acids pos, 
Uona segments o. the EST-related nucleic acids, or 
agmen.s of positional segments of the EST-related nu- 
clefc acids can be used for various purposes. The poly- 
nudeotidescan be usedto express recombinant protein 
for analysis, characterization or therapeutic use; pro- 
duction of secreted polypeptides or chimeric polypep- 
tides antibody production, as markers for tissues ,n 
SI corresponding protein is P'f« ^ 
pressed (either constitutively or at a particular stage of 
issue differentiation or development or ,n disease 
states); as molecular weight markers on Southern gels, 
as chromosome markers or tags (when labeled) to iden- 
tify chromosomes or to map related gene positions, to 
compare with endogenous DNA sequences ,n pa lents 
to identify potential genetic disorders; as probes to hy- 
bridize and thus discover novel, related DN A sequenc- 
es' as a source of information to derive PCR primers for 
genetic fingerprinting; for selecting and making , ohgorn- 
ers lor attachment to a 'gene chip- or other suppon, n^ 
cludingfor examination (or expression patterns; to raise 
anti-protein antibodies using DNA immumzation tech- 
nic and as an antigen to raise an.i-DNA antibodies 
or q e,icit another »nmune response. Where the po^nu 
cleotide encodes a protein or polypeptide which binds 
or potentially binds to another protein or l»'YP°P^ e 
(such as. for example, in a receptor-ligand mteraction) 
he polynucleotide can also be used in interaction trap 
assays (such as. fo, example, that described inGyuns 
of a/ Ce« 75:791-803 (1993)) to identify polynucle- 
otides encoding the other protein or polypeptide with 
which bindhg occurs or to identify inhibitors of the bind- 
ing interaction. 

[05151 The proteins or porypeptides prov.ded by the 
present invention can similar* be used in assays to -de- 
lerminebiologiwlactivityjncluding.napanelolmul^ 
proteins lor high-throughput screening: to ra.se antibod- 
ies or to alien another immune response; as a reagent 
(including the labeled reagent) in assays designed to 
quantitatlefy determine ievels of the protein (or its re- 



ceptor) in biological fluids; as markers for tissues in 
which the corresponding protein is preferentially ex- 
pressed (either constitutively or at a particular stage of 
tissue differentiation or development or in a disease 
s state)- and, of course, to isolate correlative receptors or 
ligands Where the protein or polypeptide binds or po- 
tentially binds to another protein or polypeptide* (such 
as for example, in a receptor-ligand interaction)., the 
protein can be used to identify the other protein with 
to which binding occurs or to identify inhibitors of the bind- 
ing interaction. Proteins or polypeptides involved m 
these binding interactions can also be used to screen 
for peptide or small molecule inhibitors or agonists of 
the binding interaction, 
is [05161 Anyoralloftheseresearchutilitiesarecapable 
of being developed into reagent grade or kit formal lor 
commercialization as research products. 
[0517] Methods lor performing the uses listed above 
are well known to those skilled in the art. References 
20 disclosing such methods include without limitation 'Mo- 
lecular Cloning; A Laboratory Manual'. 2d ed., Cold 
Spring Harbor Laboratory Press, Sambrook, J., E.K 
Fritsch andT. Maniatiseds., 1989, and "Methods in En- 
zymology Guide to Molecular Cloning Techniques", Ac- 
25 ademic Press, Berger.S.L. and A.R. Kimmeleds., 1987. 
[0518] Polynucleotides and proteins or polypeptides 
of the present invention can also be used as nutritional 
sources or supplements. Such uses include without lim- 
itation use as a protein or amino acid supplement, use 
30 as a carbon source, use as a nitrogen source and use 
as a source of carbohydrate. In such cases the protein 
or polynucleotide of the invention can be added to the 
feed of a particular organism or can be administered as 
a separate solid or liquid preparation, such as in the form 
35 of powder, pills, solutions, suspensions or capsules In 
the case of microorganisms, the protein or polynucle- 
otide of the invention can be added to the medium ,n or 
on which the microorganism is cultured. 
[0519] Although this invention has been described in 
40 terms of certain preferred embodiments, other embodi- 
ments which will be apparent to those of ord.nary skill 
in the art in view of the disclosure herein are also within 
the scope of this invention. Accordingly, the scope of the 
invention is intended to be defined only by reference to 
45 the appended claims. 



Claims 

50 1 A purified nucleic acid comprising a sequence se- 
lected from the group consisting ol SEQ ID NOs: 
24-4100 and SEQ ID NOs: 8178-36551 and se- 
quences complementary to the sequences of SEQ 
ID NOs: 24-4100 and SEQ ID NOs: 8178-36681. 

55 2 A purified nucleic acid comprising at least 10 con- 
secutive nucleotides of a sequence selected from 
the group consisting of SEQ ID NOs: 24-4100 and 
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SEQ ID NOs: 8178-36681 and sequences comple- 
mentary to the sequences ot SEQ ID NOs: 24-4100 
and SEQ ID NOs: 8178-36681 . 

3. A purified nucleic acid comprising at least 15 con- 
secutive nucleotides ot a sequence selected from 
the group consisting of SEQ ID NOs: 24-4100 and 
SEQ ID NOs: 8178-36681 and sequences comple- 
mentary to the sequences of SEQ ID NOs: 24-4100 
and SEQ ID NOs: 8178-36681. 

4. A purified nucleic acid comprising the coding se- 
quence of a sequence selected from the group con- 
sisting of SEQ ID NOs: 24-4100. 

5. A purified nucleic acid comprising the full coding se- 
quences of a sequence selected from the group 
consisting of SEQ ID NOs: 3721-3811 wherein the 
full coding sequence comprises the sequence en- 
coding the signal peptide and the sequence encod- 
ing the mature protein. 

6. A purified nucleic acid comprising a contiguous 
span of a sequence selected from the group con- 
sisting of SEQ ID NOs: 3721-3811 which encodes 
the mature protein. 

7. A purified nucleic acid comprising a contiguous 
span ot a sequence selected from the group con- 
sisting of SEQ ID NOs: 24-652 and 3721-3811 
which encode the signal peptide. 

8. A purified nucleic acid encoding a polypeptide com- 
prising a sequence selected from the group consist- 
ing of the sequences of SEQ ID NOs: 4101-8177. 

9. A purified nucleic acid encoding a polypeptide com- 
prising a sequence selected from the group consist- 
ing of the sequences of SEQ ID NOs: 7798-7888. 

1 0. A purified nucleic acid encoding a polypeptide com- 
prising a mature protein included in a sequence se- 
lected from the group consisting of the sequences 
of SEQ ID NOs: 7798-7888. 

11. A purified nucleic acid encoding a polypeptide com- 
prising a signal peptide included in a sequence se- 
lected from the group consisting of the sequences 
of SEQ ID NOs: 4101-4729 and 7798-7888. 



13. A purified or isolated polypeptide comprising a se- 
quence selected from the group consisting of the 
sequences of SEQ ID NOs: 4101-6177. 

s 14. A purified or isolated polypeptide comprising a se- 
quence selected from the group consisting of SEQ 
ID NOs: 7798-7888. 

1 5. A purified or isolated polypeptide comprising a ma- 
io ture protein of a polypeptide selected from the 

group consisting of SEQ ID NOs: 7798-7888. 

16. A purified or isolated polypeptide comprising a sig- 
nal peptide of a sequence selected from the group 

is consisting of the polypeptides of SEQ ID NOs: 
4101-4729 and 7798-7888. 

17. A purified or isolated polypeptide comprising at 
least 1 0 consecutive amino acids of a sequence se- 

20 lected from the group consisting of the sequences 
of SEQ ID NOs: 4101-8177. 

18. A method of making a cDNA comprising the steps 
of: 

25 

contacting a collection of mRNA molecules 
from human cells with a primer comprising at 
least 1 5 consecutive nucleotides of a sequence 
selected from the group consisting of the se- 

30 quences complementary to SEQ ID NOs: 

24-4100 and SEQ ID NOs: 8178-36681; 
hybridizing said primer to an mRNA in said col- 
lection that encodes said protein; 
reverse transcribing said hybridized primer to 

35 make a first cDNA strand from said mRNA: 

making a second cDNA strand complementary 
to said first cDNA strand; and 
isolating the resulting cDN A encoding said pro- 
tein comprising said first cDNA strand and said 

40 second cDNA strand. 

19. A purified cDNA obtainable by the method of Claim 
18. 

45 20. The cDNA of Claim 19 wherein said cDNA encodes 
at least a portion of a human polypeptide. 

21. A method of making a cDNA comprising the steps 
of: 



12. A purified nucleic acid at least 15 nucleotides in 
length which hybridizes under stringent conditions 
to a sequence selected from the group consisting 
of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and sequences complementary to the 
sequences of SEQ ID NOs: 24-4100 and SEQ ID 
NOs: 8178-36681. 



obtaining a cDNA comprising a sequence se- 
lected from the group consisting of SEQ ID 
NOs: 24-4100 and SEQ ID NOs: 8178-36681; 
contacting said cDNA with a detectable probe 
55 comprising at least 1 5 consecutive nucleotides 

of a sequence selected from the group consist- 
ing of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
81 78-36681 and the sequences complementa- 
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„ to SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 under conditions which permit said 
probe to hybridize to said cDNA; 
Identifying a cDNA which hybridizes to sa.d de- ^ 
tectable probe; and 

isolate said cDNA which hybrriizes to sa,d 

probe. 

22. ApurmedcDNAobtainableby.hemethodotC.airn ^ 
21. 

23 The cDNA o. Claim 22 wherein said cDN A encodes 
at least a portion ot a human polypeptide. 

24. A method ol making a cDNA comprising the steps 
ot: 

contacting a collection o. mRNA molecules 
(ro m human cells with a first pnmer capable ot 
hybridizing to the polyA tail ol sa,d mRNA, 
hybridizing said first primer to sari polyA ta.l 
reverse transcribing said mRNA to make a first 

making aTe^ond cDNA strand complementary 
,o said first cDNA strand using at least one 
primer comprising a. least 15 cons^ „„. 
cleotides ol a sequence selected from the 
J^p consisting o. SEQ ID NOs. 24-4100 and 
SEQ ID NOs: 8178-36681 ; and 
isolating the resulting cDNA comprising sa,d 
. first cDNA strand and sa.d second cDNA 
strand. 

25. A purified cDN A obtainable by the method of Claim 
24. 

26 The CDN A of Claim 25 wherein said cDNA encodes 
' at least a portion ol a human polypeptide. 



27. The method o. Claim 24, wherein the second cDNA 

strand is made by: 

contactingsaid first cDNA strand with a first pair 

a second primer comprising at least 15 consec- 
u"e nucleotides of a aequence^df^n 
,ne group consisting of SEQ ID NOs: 24-4100 
and SEC MD NOs: 8178-36681 and a third pnm- 
er having a sequence therein which is included 
within the sequence ot said first primer; 
forming a first polymerase chain react™ 
with said first pair of primers to generate a first 

pair of primers, said second pair erf primers 
oompn4 a fourth primer, said lourft pnmer 
comprising at least 1 5 consecutive nucleus 
d said sequence selected from the group con- 
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sisting of SEQ ID NOs: 24-4100 and SEQ ID 
NOs- 8178-36681, and a fifth primer, wherein 
said fourth and fifth hybridize to sequences 
within said first PCR product; and 
performing a second polymerase chain reac- 
tion, thereby generating a second PCR prod- 
uct. 

28. A purified cDNA obtainable by the method of Claim 
w 27. 

29 The cDN A of Claim 28 wherein said cDNA encodes 
at least a portion of a human polypeptide. 

,5 30. The method of Claim 24 wherein the second cDNA 
strand is made by: 

contacting said first cDNA strand with a second 
primer comprising at least 15 consecutive nu- 
20 cleotides of a sequence selected from the 

group consisting of SEQ ID NOs: 24-4100 and 
SEQ ID NOs: 8178-36681; 
hybridizing said second primer to said first 
strand cDNA; and 

extending said hybridized second pnmer to 
generate said second cDNA strand. 

31 . A purified cDNA obtainable by the method of Claim 
30. 

32 The cDNA of Claim 28, wherein said cDN A encodes 
at least a portion of a human polypeptide. 

33. A method of making a polypeptide comprising the 
is steps of: 

obtaining a cDNA which encodes a polypeptide 
encoded by a nucleic acid comprising a se- 
quence selected from the group consisting ol 
40 SEQ ID NOs: 24-4100 or a cDNA which en- 

codes a polypeptide comprising at least 1 0 con- 
secutive amino acids of a polypeptide encoded 
by a sequence selected Irom the group consist- 
ing of SEQ ID NOs: 24-4100; 
45 inserting said cDNA in an expression vector 

such that said cDNA is operabfy linked to a pro- 



moter, ■ » „ K«t 

introducing said expression vector into a host 
cell whereby said host cell produces the protein 
so encoded by said cDNA; and 

isolating said protein. 

34. An isolated protein obtainable by the method of 
Claim 33. 

35. A method of obtaining a promoter DN A comprising 
the steps of: 
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=rass:?=5=S 

8 slre 8 e' Wid genomic DNA to identity a pro- 
^pabie o. directing transcript- ,ni„a- ^ 

!sdattng said DNA comprising said identified 

promoter. 

lhnri nt claim 35, wherein said obtaining 

S&K "die sequenc 

SEQ ID NOs: 24-4100 and SEQ ID nus 
8178-36681. 

«f rtaim 36 wherein said screening 
37 The method of Claim jo, w. .rvatfid ud- 

ccn in NOs: 24-4100 and SEQ IU 
1,78-3668, into a promoter reporter vector. 

38 The method of Cairn 36. wherein said screening 
localed upstream or 

sites or transcription start sites. 

39 An isolated promoter obtainable by the method o. 

any one of Claims 34 to 38. 

WQ in NOs: 24-4100 and SEQ iu 
mre-^Bl and iragments comprising a, leas. 15 
consecutive nucleotides 01 said sequence. 



es. 



42 The array ot Claim 40 including therein at least to 
lequences selected from the group , conssting- 
s SEQ ID NOs: 24-4100 and SEQ ID NOS 
8178-36681, the sequences complementary to 
sequences o. SEQ ID NOs: 24-4100 and SEQ. D 
NOs 8178-36681 and Iragments comprising a 
, e ast 15 consecutive nucleotides ol said sequent- 
es. 

43 An enriched population ol recombinant nucleic ac- 
"id recombinant nuclei acids comprising an 
£*t nucleic acid and a backbone nuclei ac* 
„ wherein at least 5% of saW insert nucleic acids n 
population comprise a sequence selected Iron 
consisting of SEQ ID NOs: 24-4,00 arc. 
sloiDNOs: 8178-36681 and the sequences ;conv 
ptementary to SEQ ID NOs: 24-4100 and SEQ ID 
20 NOs: 8178-36681. 

44 Apurifiedorisotatedantibodycapableolspecific* 
/binding to a polypeptide compns.ng a sequer^ 
selected from the group coning of SEQ ID NOs. 

25 4101-8177. 

45 . Apuriliedor isolated antibody capableotspeciHcah 
ty binding to a polypeptide comprising at least 10 
consecuL amino ac«s of a sequence se,ec£ 
from the group consisting of SEQ ID NOs. 
4101-8177. 

46 An antibody composition capable of selectively 
* bTnd* to an epi.ope«on,aining fragment o a 
3S porypeptide comprising a ""^^ t£ 
least 8 amino acids ol any of SEQ ID nu* 
4,01-8177. wherein said antibody is polyclonal or 
monoclonal. 

40 47 A computer readable medium having stored there- 
Sequence selected from the group consisting 
oTa nudeic acid code of SEQ ID NOs: 24^,CO and 
8178-3668, and a polypeptide code of SEQ iu 
NOs: 4,01-8177. 

45 48 A computer system comprising a processor and a 
Storage device wherein said data storage de- 
?*Z sLed thereon a sequence select* rom 
the group consisting of a nucleic acid code o. SE 

so ofoNOs 24-4100and8178.36681 andapolypep- 
tide code of SEQ ID NOs: 410,-8,77. 



49 The compute, system o, Claim 48 further ccmpr^ 
Jrgasequencecomparerandadata storage dev^e 

h^ing reference sequences stored thereon. 

50 The computer system of Claim 49 wherein said se- 
qutce comparer comprises a compute, program 
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polypeptides. 

qUe ""' at , ( stseguencetoare1e.- X°P lid8S 

compos P'og^- 

andlherelerencesequ ^ 
poiymorp^s. ma5epu enoeso- 

a polypep^e code o 

compter program ^ ^ 
sequences, and sequence * sa.d 

identifying tenures 

computer program. 

-8— 

one oi Claims no 12- 
prising the steps ol. ^ ^ 

h of any ° ne 

-:;=^~&^ 

unking together tw> 55 



ids ' .j „i anv one ol 

. me <hoc. o. making a poW'P^ p ° tides * i» 
59. A metnoa o ^ sa , d p^ypep , 

^aX.eng^iesscornpns.ng 
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EP 1 033 401 A2 




Sequences 




^termination of 
consensus sequences 

Extension of consensus 
sequences 



Identification 
ofOrCs 




Consensus 
sequences 



OKFs 
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Reverse 
transcription 



yj-j- — - Nested PCR 
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Description of Tran S cnp..on Factor 
SignalTagseq»« nces 



MatrU 



-502 

CMYB_0I 5Q i 

MYOD.Q6 ^ 44 

S8.01 ^25 

S8 01 rti .390 

DELTAEFlJlt ^ 

GATA.C .349 

CMYB 01 _ 343 

GATAl_02 339 + 

TAL1BETAE47_01 '235 + 

MYOD.Q6 2i7 

GATA1.04 , 26 

IK1.01 .126 

iwjm . l23 + 

CREL 01 . 96 

GATA1_02 ^ 

SRY_02 .33 + 

E2F 02 .5 

MZF1_01 

Promoter .equenee flSWgJ ™ 0 ^ on 
Matrix 



0.983 
0.961 
0.960 
0966 
0 960 
0.964 
0.958 
0959 
0.953 
0.973 
0.983 
0.978 
0.954 
0.953 
0.963 
0.985 
0962 
0.950 
0-951 
0.957 
0.975 



Length Sequence 

9 TGTCAGTTG 

10 CCCAACTGAC 



11 foAffi- 

• ^Sggaca 

!I cataacagatggtaag 

< cataacagatggtaag 

6 caIaacagatggtaag 

ft accatctgtt 

? TCAAGATAAAGTA 

\\ Icttoggaattcc 

2 agttgggaattc 

0 TGGGAATTCC 

S TCAGTGATATGGCA 

2 TAAAACAAAACA 
r TTTAGCGC 



TAAAA<-«^ 
g TTTAGCGC 
g TGAGGGGA 



Score Untth Sequence 



NFY_Q° 

MZF1_01 

CMYB.Ol 

VMYB_02 

STATJH 

STAT.01 

MZF1.01 

IK2 01 

MZF1.01 

SRY 02 

MZFl.Ol 

b4YOD_Q6 

DELTAEF1_0) 

S8 01 
MZF1_0» 

Promoter iequenc 
Matrix 

ARNT_01 

NMYC_0l 

USF 01 

USFlOl 

NMYC_0l 

MYCMAX.02 

USF_C 

USF_C 

MZF1_01 

ELK I. 02 

CETS1P54_01 

API Q* 

AP1FJ_Q 2 

PADS,C 



.748 
-738 
-684 
.682 
-673 
-673 
-556 
-451 
^24 
-398 
-216 
.190 
.176 
5 

16 



0.956 
0.962 
0.994 
0.985 
0.968 
0.951 
0.956 
0.965 
0.986 
0.955 
0.960 
0.981 
0.958 
0.992 
0.986 



p29B6(555bp>: 
Position 


Orientation 


Score 






0.964 


-311 


+ 


0.965 


-309 


+ 


0.985 


-309 




0.985 


-309 




0-956 


.309 




0.972 


-309 


4- 


0.997 


-307 




0.991 


-307 




0.968 


-292 


+ 


0.963 


-105 




0.974 


-102 




0.963 


ua 




0.961 


-42 


+ 


1.000 


45 







, , GGACCAATCAT 

8 CCTGGGGA 

9 TGACCGTTG 
9 TCCAACGGT 
9 TTCCTGGAA 

0 TTCCAGGAA 

r TTGGGGGA 

.? GAATGGGATTTC 

1 AGAGGGGA 
,2 GAAAACAAAACA 
\ GAAGGGGA 

,5 AGCATCTGCC 

1 TCCCACCTTCC 
GAGGCAATTAT 
g AGAGGGGA 

Length Sequence 

iGACTC 

..CTCACGTGCIU 
ACTCACGTGCTG 
o CAGCACGTGAGT 
O CAGCACGTGAGT 
12 CAGCACGTGAGT 
o TCACGTGC 
8 GCACGTGA 
o CATGGGGA 

,5 ctctccggaagcct 
0 tccggaagcc 

n AGTGACTGAAC 
J AGTGACTGAAC 
9 TGTGGTCTC 

FIGURE 5 



^ngth Sequence 

, 6 GG ACTC ACGTGCTGCT 

2 ACTCACGTGCTG 

« ACTCACGTGCTG 



Location in: 
SEQ1DNO: 17 
17-25 

complement ofl 8-27 
complement or 7>-s •> 
94-104 

cornplcment of 129-139 

complement oil 55-103 

170-178 

176-189 

180-190 

284-299 

284-299 

complement of 302-314 

393-405 

393^04 

396-405 

423^36 

complement of 478-489 
complement of 5 14-52 1 

Location in: 
SEQ1DNO:20 
complement of 60-70 

70-77 

^men. of 126-134 

135-143 m1 ..., 
complement ofl3S-143 

complement of 252-259 
357-368 

cowmen, of4t0-42l 

592-599 
618-627 

cornplemem of 813-823 
complement of 824-831 

Location in: 
SEQ ID NO: 23 
191-206 
193-204 

complement of 210-217 
397-410 

complement of 460-470 
547-555 
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